{
  "nbformat": 4,
  "nbformat_minor": 5,
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "name": "python",
      "version": "3.13.0"
    },
    "blog_metadata": {
      "topic": "Should your database be the prompt? Rethinking NL2SQL with SQL MCP Server",
      "slug": "should-your-database-be-the-prompt-rethinking-nl2sql-with-sq",
      "generated_by": "LinkedIn Post Generator + Azure OpenAI",
      "generated_at": "2026-05-15T16:39:48.104Z"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Should your database be the prompt? Rethinking NL2SQL with SQL MCP Server\n",
        "\n",
        "This notebook turns the blog post into a hands-on validation of the core architectural claim: production data access should be mediated through governed tools, not arbitrary NL2SQL. We will simulate constrained tool invocation, parameter validation, trace logging, and a local governed endpoint pattern in Python so the ideas can be tested safely without requiring a live production database."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "%pip install -q pandas requests"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "from __future__ import annotations\n",
        "\n",
        "import json\n",
        "import os\n",
        "from dataclasses import dataclass\n",
        "from datetime import datetime, timezone\n",
        "from typing import Any, Dict, List, Optional\n",
        "\n",
        "import pandas as pd\n",
        "import requests"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Architecture comparison\n",
        "\n",
        "The blog contrasts direct NL2SQL with a mediated pattern where an agent selects an approved tool, policy validates parameters, and execution happens through a governed interface. The cell below renders that comparison as plain text so the control points are easy to inspect in a notebook."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "architecture_flow = {\n",
        "    \"mediated_access\": [\n",
        "        \"User asks a business question\",\n",
        "        \"LLM Agent interprets intent\",\n",
        "        \"Policy + schema hints constrain choices\",\n",
        "        \"SQL MCP Tool is selected\",\n",
        "        \"Allowed operations only with approved params\",\n",
        "        \"Database / Data API Builder executes governed request\",\n",
        "        \"Rows / aggregates returned\",\n",
        "        \"Trace log captured\",\n",
        "        \"Agent produces grounded natural-language answer\",\n",
        "    ],\n",
        "    \"raw_nl2sql_risk\": [\n",
        "        \"User asks a business question\",\n",
        "        \"Model generates arbitrary SQL\",\n",
        "        \"Database executes or rejects query\",\n",
        "        \"Higher risk: joins, exfiltration, drift, cost spikes\",\n",
        "    ],\n",
        "}\n",
        "\n",
        "print(\"Mediated access pattern:\")\n",
        "for i, step in enumerate(architecture_flow[\"mediated_access\"], start=1):\n",
        "    print(f\"{i}. {step}\")\n",
        "\n",
        "print(\"\\nRaw NL2SQL pattern:\")\n",
        "for i, step in enumerate(architecture_flow[\"raw_nl2sql_risk\"], start=1):\n",
        "    print(f\"{i}. {step}\")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Minimal governed agent\n",
        "\n",
        "This example demonstrates the central idea from the post: the agent does not generate raw SQL. Instead, it chooses from a small set of approved operations and calls a constrained tool that returns grounded results."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "from typing import Dict, Any, List\n",
        "\n",
        "ALLOWED_OPS = {\"sales_by_region\", \"top_customers\"}\n",
        "\n",
        "def sql_mcp_call(operation: str, params: Dict[str, Any]) -> List[Dict[str, Any]]:\n",
        "    if operation not in ALLOWED_OPS:\n",
        "        raise ValueError(f\"Operation not allowed: {operation}\")\n",
        "    print({\"trace\": \"tool.invoke\", \"tool\": \"sql_mcp\", \"operation\": operation, \"params\": params})\n",
        "    if operation == \"sales_by_region\":\n",
        "        return [{\"region\": \"West\", \"revenue\": 125000}, {\"region\": \"East\", \"revenue\": 98000}]\n",
        "    return [{\"customer\": \"Contoso\", \"revenue\": 42000}, {\"customer\": \"Fabrikam\", \"revenue\": 39000}]\n",
        "\n",
        "def agent(question: str) -> str:\n",
        "    if \"region\" in question.lower():\n",
        "        rows = sql_mcp_call(\"sales_by_region\", {\"year\": 2025})\n",
        "    else:\n",
        "        rows = sql_mcp_call(\"top_customers\", {\"limit\": 2})\n",
        "    return f\"Grounded answer from governed tool result: {rows}\"\n",
        "\n",
        "print(agent(\"Which region has the highest revenue this year?\"))\n",
        "print(agent(\"Show me the top customers\"))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Tool contract and parameter validation\n",
        "\n",
        "A key production control is making the tool contract explicit and validating parameters before execution. This keeps the execution envelope bounded and makes policy easier to audit."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "from dataclasses import dataclass\n",
        "from typing import Dict, Any\n",
        "\n",
        "@dataclass(frozen=True)\n",
        "class ToolSpec:\n",
        "    name: str\n",
        "    description: str\n",
        "    params: Dict[str, str]\n",
        "\n",
        "SQL_MCP_SPEC = ToolSpec(\n",
        "    name=\"sql_mcp.sales_by_region\",\n",
        "    description=\"Returns revenue aggregated by region for an approved year.\",\n",
        "    params={\"year\": \"int between 2020 and 2030\"}\n",
        ")\n",
        "\n",
        "def validate_params(params: Dict[str, Any]) -> Dict[str, Any]:\n",
        "    year = int(params[\"year\"])\n",
        "    if year < 2020 or year > 2030:\n",
        "        raise ValueError(\"year out of approved range\")\n",
        "    return {\"year\": year}\n",
        "\n",
        "approved = validate_params({\"year\": \"2025\"})\n",
        "print(SQL_MCP_SPEC)\n",
        "print({\"safe_request\": approved})\n",
        "\n",
        "try:\n",
        "    validate_params({\"year\": \"2035\"})\n",
        "except Exception as e:\n",
        "    print({\"rejected_request\": str(e)})"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Sequence of a governed request\n",
        "\n",
        "The original post included a sequence diagram. Here we represent the same flow as structured Python data so it can be inspected, logged, or tested programmatically."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "sequence = [\n",
        "    (\"User\", \"Agent\", '\"Show top customers this quarter\"'),\n",
        "    (\"Agent\", \"Policy\", \"Resolve allowed tool + params\"),\n",
        "    (\"Policy\", \"Agent\", \"top_customers(limit=5, quarter=Q2)\"),\n",
        "    (\"Agent\", \"SQL MCP Server\", \"invoke tool with approved operation\"),\n",
        "    (\"SQL MCP Server\", \"Database\", \"parameterized query / governed endpoint\"),\n",
        "    (\"Database\", \"SQL MCP Server\", \"result set\"),\n",
        "    (\"SQL MCP Server\", \"Agent\", \"rows + trace metadata\"),\n",
        "    (\"Agent\", \"User\", \"grounded summary with provenance\"),\n",
        "]\n",
        "\n",
        "for sender, receiver, message in sequence:\n",
        "    print(f\"{sender} -> {receiver}: {message}\")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Trace logging for observability and provenance\n",
        "\n",
        "Production systems need more than correct answers. They need traceability. This example logs agent and tool events with timestamps so actions can be reviewed after the fact."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "import json\n",
        "from datetime import datetime, timezone\n",
        "\n",
        "def trace(event: str, **fields) -> None:\n",
        "    payload = {\n",
        "        \"ts\": datetime.now(timezone.utc).isoformat(),\n",
        "        \"event\": event,\n",
        "        **fields,\n",
        "    }\n",
        "    print(json.dumps(payload, separators=(\",\", \":\")))\n",
        "\n",
        "trace(\"agent.start\", question=\"Show top customers this quarter\")\n",
        "trace(\"tool.selected\", tool=\"sql_mcp\", operation=\"top_customers\")\n",
        "trace(\"tool.result\", row_count=2, source=\"approved_view.sales_summary\")\n",
        "trace(\"agent.answer\", grounded=True)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Required environment variables for local or cloud integration\n",
        "\n",
        "If you later connect this notebook to a real SQL MCP Server or a governed API layer, you will typically need variables such as:\n",
        "\n",
        "- `SQLMCP_HOST`\n",
        "- `DAB_CONNECTION_STRING`\n",
        "- `DAB_CONFIG`\n",
        "- `AZURE_TENANT_ID`\n",
        "- `AZURE_CLIENT_ID`\n",
        "- `DB_PASSWORD`\n",
        "- `AZURE_SQL_SERVER`\n",
        "- `AZURE_SQL_DATABASE`\n",
        "\n",
        "The next cells keep everything in Python and only simulate configuration so the notebook remains safe to run locally."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Local proof-of-concept environment setup in Python\n",
        "\n",
        "The blog included PowerShell for setting local environment variables. This Python version mirrors that intent and validates that a password is present before a proof of concept would proceed."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "import os\n",
        "\n",
        "os.environ.setdefault(\"SQLMCP_HOST\", \"http://localhost:5000\")\n",
        "os.environ.setdefault(\n",
        "    \"DAB_CONNECTION_STRING\",\n",
        "    \"Server=tcp:localhost,1433;Database=SalesDb;User ID=appuser;Password=${DB_PASSWORD};Encrypt=True;TrustServerCertificate=True\",\n",
        ")\n",
        "os.environ.setdefault(\"DAB_CONFIG\", \"./dab-config.json\")\n",
        "os.environ.setdefault(\"AZURE_TENANT_ID\", \"00000000-0000-0000-0000-000000000000\")\n",
        "os.environ.setdefault(\"AZURE_CLIENT_ID\", \"11111111-1111-1111-1111-111111111111\")\n",
        "\n",
        "print(f\"SQLMCP_HOST={os.environ['SQLMCP_HOST']}\")\n",
        "print(f\"DAB_CONFIG={os.environ['DAB_CONFIG']}\")\n",
        "\n",
        "if not os.environ.get(\"DB_PASSWORD\"):\n",
        "    print(\"DB_PASSWORD is not set. For a real PoC, set it before connecting to a database.\")\n",
        "else:\n",
        "    print(\"Environment ready.\")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Minimal governed endpoint configuration\n",
        "\n",
        "This Python cell recreates the spirit of the Data API Builder configuration from the post: a read-only endpoint over a single entity. Writing the config as JSON makes the governance boundary explicit and easy to inspect."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "import json\n",
        "from pathlib import Path\n",
        "\n",
        "config = {\n",
        "    \"$schema\": \"https://github.com/Azure/data-api-builder/releases/latest/download/dab.draft.schema.json\",\n",
        "    \"data-source\": {\n",
        "        \"database-type\": \"mssql\",\n",
        "        \"connection-string\": os.environ.get(\"DAB_CONNECTION_STRING\", \"\")\n",
        "    },\n",
        "    \"runtime\": {\n",
        "        \"rest\": {\"enabled\": True, \"path\": \"/api\"},\n",
        "        \"host\": {\"mode\": \"development\"}\n",
        "    },\n",
        "    \"entities\": {\n",
        "        \"SalesSummary\": {\n",
        "            \"source\": \"dbo.SalesSummary\",\n",
        "            \"permissions\": [\n",
        "                {\"role\": \"anonymous\", \"actions\": [\"read\"]}\n",
        "            ]\n",
        "        }\n",
        "    }\n",
        "}\n",
        "\n",
        "config_path = Path(\"./dab-config.json\")\n",
        "config_path.write_text(json.dumps(config, indent=2), encoding=\"utf-8\")\n",
        "print(config_path.read_text(encoding=\"utf-8\"))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Simulating a governed endpoint response\n",
        "\n",
        "Instead of starting external services from the notebook, this cell simulates what a governed endpoint might return. The goal is to validate the application-side pattern: approved endpoint, bounded result shape, and simple row counting."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "mock_response = {\n",
        "    \"value\": [\n",
        "        {\"region\": \"West\", \"revenue\": 125000, \"year\": 2025},\n",
        "        {\"region\": \"East\", \"revenue\": 98000, \"year\": 2025},\n",
        "        {\"region\": \"Central\", \"revenue\": 87000, \"year\": 2025},\n",
        "    ]\n",
        "}\n",
        "\n",
        "rows = mock_response.get(\"value\", [])\n",
        "if rows is None:\n",
        "    raise RuntimeError(\"No response from governed endpoint simulation.\")\n",
        "\n",
        "print(f\"Rows returned: {len(rows)}\")\n",
        "pd.DataFrame(rows)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Azure-ready secure connection pattern in Python\n",
        "\n",
        "The post also showed a cloud-oriented pattern using secret retrieval. In this notebook we simulate that flow by reading from environment variables and constructing a non-secret summary of the target resources."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "import os\n",
        "\n",
        "kv_name = os.environ.get(\"KEY_VAULT_NAME\", \"demo-governed-data-kv\")\n",
        "sql_server = os.environ.get(\"AZURE_SQL_SERVER\", \"demo-sql-server.database.windows.net\")\n",
        "db_name = os.environ.get(\"AZURE_SQL_DATABASE\", \"SalesDb\")\n",
        "secret_connection_string = os.environ.get(\"DAB_CONNECTION_STRING\")\n",
        "\n",
        "print(\"Using secret-backed connection pattern (simulated).\")\n",
        "print({\n",
        "    \"key_vault\": kv_name,\n",
        "    \"target_server\": sql_server,\n",
        "    \"target_database\": db_name,\n",
        "    \"has_connection_string\": bool(secret_connection_string),\n",
        "})"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## End-to-end validation: intent, policy, tool, and provenance\n",
        "\n",
        "This final executable example combines the notebook's ideas into one small control plane. The agent resolves intent, policy validates parameters, the governed tool executes only approved operations, and the answer includes provenance."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "from dataclasses import dataclass\n",
        "from typing import Any, Dict, List\n",
        "from datetime import datetime, timezone\n",
        "import json\n",
        "\n",
        "@dataclass(frozen=True)\n",
        "class GovernedTool:\n",
        "    name: str\n",
        "    allowed_ops: set\n",
        "\n",
        "TOOL = GovernedTool(name=\"sql_mcp\", allowed_ops={\"sales_by_region\", \"top_customers\"})\n",
        "\n",
        "SALES_DATA = [\n",
        "    {\"region\": \"West\", \"revenue\": 125000, \"year\": 2025},\n",
        "    {\"region\": \"East\", \"revenue\": 98000, \"year\": 2025},\n",
        "    {\"region\": \"Central\", \"revenue\": 87000, \"year\": 2025},\n",
        "]\n",
        "TOP_CUSTOMERS = [\n",
        "    {\"customer\": \"Contoso\", \"revenue\": 42000, \"quarter\": \"Q2\"},\n",
        "    {\"customer\": \"Fabrikam\", \"revenue\": 39000, \"quarter\": \"Q2\"},\n",
        "    {\"customer\": \"Northwind\", \"revenue\": 31000, \"quarter\": \"Q2\"},\n",
        "]\n",
        "\n",
        "def emit(event: str, **fields: Any) -> None:\n",
        "    payload = {\"ts\": datetime.now(timezone.utc).isoformat(), \"event\": event, **fields}\n",
        "    print(json.dumps(payload, separators=(\",\", \":\")))\n",
        "\n",
        "def resolve_intent(question: str) -> Dict[str, Any]:\n",
        "    q = question.lower()\n",
        "    if \"region\" in q:\n",
        "        return {\"operation\": \"sales_by_region\", \"params\": {\"year\": 2025}}\n",
        "    return {\"operation\": \"top_customers\", \"params\": {\"limit\": 2, \"quarter\": \"Q2\"}}\n",
        "\n",
        "def validate_policy(operation: str, params: Dict[str, Any]) -> Dict[str, Any]:\n",
        "    if operation not in TOOL.allowed_ops:\n",
        "        raise ValueError(f\"Operation not allowed: {operation}\")\n",
        "    if operation == \"sales_by_region\":\n",
        "        year = int(params[\"year\"])\n",
        "        if not 2020 <= year <= 2030:\n",
        "            raise ValueError(\"year out of approved range\")\n",
        "        return {\"year\": year}\n",
        "    limit = int(params.get(\"limit\", 5))\n",
        "    if limit < 1 or limit > 10:\n",
        "        raise ValueError(\"limit out of approved range\")\n",
        "    quarter = str(params.get(\"quarter\", \"Q1\"))\n",
        "    if quarter not in {\"Q1\", \"Q2\", \"Q3\", \"Q4\"}:\n",
        "        raise ValueError(\"quarter invalid\")\n",
        "    return {\"limit\": limit, \"quarter\": quarter}\n",
        "\n",
        "def governed_execute(operation: str, params: Dict[str, Any]) -> List[Dict[str, Any]]:\n",
        "    emit(\"tool.invoke\", tool=TOOL.name, operation=operation, params=params)\n",
        "    if operation == \"sales_by_region\":\n",
        "        return [row for row in SALES_DATA if row[\"year\"] == params[\"year\"]]\n",
        "    rows = [row for row in TOP_CUSTOMERS if row[\"quarter\"] == params[\"quarter\"]]\n",
        "    return rows[: params[\"limit\"]]\n",
        "\n",
        "def answer_question(question: str) -> Dict[str, Any]:\n",
        "    emit(\"agent.start\", question=question)\n",
        "    intent = resolve_intent(question)\n",
        "    emit(\"intent.resolved\", **intent)\n",
        "    approved_params = validate_policy(intent[\"operation\"], intent[\"params\"])\n",
        "    emit(\"policy.approved\", operation=intent[\"operation\"], params=approved_params)\n",
        "    rows = governed_execute(intent[\"operation\"], approved_params)\n",
        "    emit(\"tool.result\", row_count=len(rows), source=\"approved_contract\")\n",
        "    return {\n",
        "        \"question\": question,\n",
        "        \"operation\": intent[\"operation\"],\n",
        "        \"approved_params\": approved_params,\n",
        "        \"rows\": rows,\n",
        "        \"provenance\": {\n",
        "            \"tool\": TOOL.name,\n",
        "            \"source\": \"approved_contract\",\n",
        "            \"grounded\": True,\n",
        "        },\n",
        "    }\n",
        "\n",
        "result_1 = answer_question(\"Which region has the highest revenue this year?\")\n",
        "result_2 = answer_question(\"Show top customers this quarter\")\n",
        "\n",
        "print(\"\\nResult 1:\")\n",
        "print(json.dumps(result_1, indent=2))\n",
        "print(\"\\nResult 2:\")\n",
        "print(json.dumps(result_2, indent=2))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Summary\n",
        "\n",
        "This notebook validated the blog's main argument in executable form: the safer production pattern is not arbitrary NL2SQL, but mediated access through approved tools, validated parameters, policy checks, and trace logging. The examples showed how to separate intent resolution from execution authority and how to preserve provenance around every data action.\n",
        "\n",
        "## Next Steps\n",
        "\n",
        "1. Replace the mock datasets with a read-only governed API or SQL MCP-compatible endpoint in a non-production environment.\n",
        "2. Add stronger policy checks for tenant scope, row limits, masking, and user entitlements.\n",
        "3. Connect the trace events to your observability stack for audit and post-hoc review.\n",
        "4. Introduce a semantic layer so business terms like revenue, active customer, and quarter are resolved outside the prompt.\n",
        "5. Test failure modes explicitly: prompt injection, over-broad requests, invalid parameters, and cost-control violations."
      ]
    }
  ]
}