{
  "nbformat": 4,
  "nbformat_minor": 5,
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "name": "python",
      "version": "3.13.0"
    },
    "blog_metadata": {
      "topic": "Building Governed AI Data Products with the Public Fabric Data Agent API",
      "slug": "building-governed-ai-data-products-with-the-public-fabric-da",
      "generated_by": "LinkedIn Post Generator + Azure OpenAI",
      "generated_at": "2026-07-02T12:39:14.696Z"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Building Governed AI Data Products with the Public Fabric Data Agent API\n",
        "\n",
        "This notebook turns the blog walkthrough into a hands-on validation flow for building a governed interface around Microsoft Fabric Data Agents. It focuses on practical checkpoints: prerequisites, identity, token acquisition, API invocation, correlation IDs, audit logging, and denial-path testing before production rollout."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "%pip install -q msal requests"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "import os\n",
        "import re\n",
        "import json\n",
        "import time\n",
        "import uuid\n",
        "from datetime import datetime, timezone\n",
        "\n",
        "import requests\n",
        "import msal"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Governance framing and implementation order\n",
        "\n",
        "Treat the Fabric data agent as a governed data product interface, not just a chat feature. The key controls to validate are approved source scope, ownership, access model, audit requirements, operational expectations, and lifecycle policy.\n",
        "\n",
        "A practical implementation order is:\n",
        "\n",
        "1. Create curated source assets in Fabric.\n",
        "2. Create the data agent and attach only approved sources.\n",
        "3. Confirm workspace and source permissions.\n",
        "4. Register the calling app in Microsoft Entra ID.\n",
        "5. Acquire a Fabric API token.\n",
        "6. Invoke the agent with correlation IDs.\n",
        "7. Inspect the response and capture request metadata.\n",
        "8. Test denial paths.\n",
        "9. Log requests and outcomes for audit.\n",
        "10. Promote through environments with approval and rollback steps."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Reference architecture\n",
        "\n",
        "The production pattern includes Fabric source assets, a Fabric data agent scoped to approved assets, an API consumer, Microsoft Entra ID for token issuance, a governance plane for lineage and policy, and a monitoring plane for logs and investigation.\n",
        "\n",
        "Flow:\n",
        "\n",
        "Enterprise App -> Acquire Entra ID Token -> Fabric Data Agent API -> Governed Data Product -> Policy / Lineage / Workspace Controls\n",
        "\n",
        "Fabric Data Agent API -> Response + Correlation ID -> App Logging / Audit Trail"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Validate workspace and rollout assumptions\n",
        "\n",
        "The original post used PowerShell for preflight checks. In this notebook, the same validation is implemented in Python so it can run directly here. This checks identifier formats, Python runtime assumptions, and whether expected environment variables are present before debugging authentication or API behavior."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "WORKSPACE_ID = os.getenv('FABRIC_WORKSPACE_ID', 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee')\n",
        "CAPACITY_ID = os.getenv('FABRIC_CAPACITY_ID', 'bbbbbbbb-cccc-dddd-eeee-ffffffffffff')\n",
        "\n",
        "checks = {\n",
        "    'WorkspaceIdFormat': bool(re.match(r'^[0-9a-fA-F-]{36}$', WORKSPACE_ID)),\n",
        "    'CapacityIdFormat': bool(re.match(r'^[0-9a-fA-F-]{36}$', CAPACITY_ID)),\n",
        "    'PythonVersionSupported': True,\n",
        "    'RequestsInstalled': True,\n",
        "    'MsalInstalled': True,\n",
        "}\n",
        "\n",
        "print(json.dumps(checks, indent=2))\n",
        "\n",
        "if not all(checks.values()):\n",
        "    raise RuntimeError('Environment prerequisite validation failed.')"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Required environment variables\n",
        "\n",
        "Set these before attempting token acquisition or live API calls:\n",
        "\n",
        "- `TENANT_ID`\n",
        "- `CLIENT_ID`\n",
        "- `CLIENT_SECRET`\n",
        "- `FABRIC_WORKSPACE_ID`\n",
        "- `FABRIC_DATA_AGENT_ID`\n",
        "\n",
        "Optional:\n",
        "\n",
        "- `FABRIC_API_URL` (defaults to `https://api.fabric.microsoft.com/v1/dataAgents/query`)\n",
        "- `FABRIC_SCOPE` (defaults to `https://api.fabric.microsoft.com/.default`)\n",
        "- `FABRIC_QUESTION`"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Check app registration configuration assumptions\n",
        "\n",
        "This cell validates that the minimum Entra app registration values are available in the environment. It is the notebook equivalent of the rollout-readiness check from the blog post."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "required_env = ['TENANT_ID', 'CLIENT_ID', 'CLIENT_SECRET']\n",
        "missing = [name for name in required_env if not os.getenv(name)]\n",
        "\n",
        "if missing:\n",
        "    print(f'Missing required environment variables: {\", \".join(missing)}')\n",
        "else:\n",
        "    print('Environment variables present.')\n",
        "    print('Next step: acquire a token and validate Fabric permissions.')"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Acquire an Entra ID token for the Fabric API\n",
        "\n",
        "This example validates one milestone only: whether the calling app can authenticate and obtain a token for the Fabric API. Verify the current Fabric resource and scope requirements in Microsoft documentation for your tenant before using this against production."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "TENANT_ID = os.getenv('TENANT_ID', 'contoso.onmicrosoft.com')\n",
        "CLIENT_ID = os.getenv('CLIENT_ID', '11111111-2222-3333-4444-555555555555')\n",
        "CLIENT_SECRET = os.getenv('CLIENT_SECRET', 'your-client-secret')\n",
        "SCOPE = [os.getenv('FABRIC_SCOPE', 'https://api.fabric.microsoft.com/.default')]\n",
        "\n",
        "app = msal.ConfidentialClientApplication(\n",
        "    CLIENT_ID,\n",
        "    authority=f'https://login.microsoftonline.com/{TENANT_ID}',\n",
        "    client_credential=CLIENT_SECRET,\n",
        ")\n",
        "\n",
        "result = app.acquire_token_for_client(scopes=SCOPE)\n",
        "\n",
        "if 'access_token' in result:\n",
        "    access_token = result['access_token']\n",
        "    print(access_token[:40] + '...')\n",
        "else:\n",
        "    access_token = None\n",
        "    print(json.dumps(result, indent=2))\n",
        "    raise RuntimeError(f\"Token acquisition failed: {result.get('error_description', result)}\")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Invoke the Fabric Data Agent API with correlation IDs\n",
        "\n",
        "This example sends a request with a client correlation ID and captures enough response metadata to support troubleshooting and audit. Because endpoint paths and payload schema can evolve, treat this as a validation pattern and confirm the exact contract in current Microsoft Learn documentation."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "ACCESS_TOKEN = globals().get('access_token') or os.getenv('FABRIC_ACCESS_TOKEN', 'eyJ...')\n",
        "API_URL = os.getenv('FABRIC_API_URL', 'https://api.fabric.microsoft.com/v1/dataAgents/query')\n",
        "WORKSPACE_ID = os.getenv('FABRIC_WORKSPACE_ID', 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee')\n",
        "DATA_AGENT_ID = os.getenv('FABRIC_DATA_AGENT_ID', 'ffffffff-1111-2222-3333-444444444444')\n",
        "QUESTION = os.getenv('FABRIC_QUESTION', 'Summarize sales variance by region for the last quarter.')\n",
        "CLIENT_REQUEST_ID = str(uuid.uuid4())\n",
        "\n",
        "headers = {\n",
        "    'Authorization': f'Bearer {ACCESS_TOKEN}',\n",
        "    'Content-Type': 'application/json',\n",
        "    'x-ms-client-request-id': CLIENT_REQUEST_ID,\n",
        "}\n",
        "\n",
        "payload = {\n",
        "    'workspaceId': WORKSPACE_ID,\n",
        "    'dataAgentId': DATA_AGENT_ID,\n",
        "    'question': QUESTION,\n",
        "}\n",
        "\n",
        "response = None\n",
        "start = time.time()\n",
        "\n",
        "try:\n",
        "    response = requests.post(API_URL, headers=headers, json=payload, timeout=30)\n",
        "    latency_ms = round((time.time() - start) * 1000, 2)\n",
        "    response.raise_for_status()\n",
        "    response_json = response.json() if response.content else {}\n",
        "    print(json.dumps({\n",
        "        'status': response.status_code,\n",
        "        'latencyMs': latency_ms,\n",
        "        'clientRequestId': CLIENT_REQUEST_ID,\n",
        "        'platformRequestId': response.headers.get('x-ms-request-id'),\n",
        "        'body': response_json,\n",
        "    }, indent=2))\n",
        "except requests.HTTPError as ex:\n",
        "    latency_ms = round((time.time() - start) * 1000, 2)\n",
        "    print(json.dumps({\n",
        "        'error': 'http_error',\n",
        "        'status': response.status_code if response is not None else None,\n",
        "        'latencyMs': latency_ms,\n",
        "        'clientRequestId': CLIENT_REQUEST_ID,\n",
        "        'requestId': response.headers.get('x-ms-request-id') if response is not None else None,\n",
        "        'details': response.text if response is not None else str(ex),\n",
        "    }, indent=2))\n",
        "    raise"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## REST client equivalent for quick testing\n",
        "\n",
        "The blog also included a REST example. This cell renders the equivalent HTTP request so you can compare notebook behavior with Postman, VS Code REST Client, or another API tool while preserving governance-friendly headers."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "rest_example = f'''POST {os.getenv(\"FABRIC_API_URL\", \"https://api.fabric.microsoft.com/v1/dataAgents/query\")} HTTP/1.1\n",
        "Authorization: Bearer {{{{access_token}}}}\n",
        "Content-Type: application/json\n",
        "x-ms-client-request-id: {CLIENT_REQUEST_ID if 'CLIENT_REQUEST_ID' in globals() else '8d7d4f6d-2f0d-4f0a-a4f2-7d6d7f9f0a11'}\n",
        "\n",
        "{{\n",
        "  \"workspaceId\": \"{os.getenv('FABRIC_WORKSPACE_ID', 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee')}\",\n",
        "  \"dataAgentId\": \"{os.getenv('FABRIC_DATA_AGENT_ID', 'ffffffff-1111-2222-3333-444444444444')}\",\n",
        "  \"question\": \"Which governed data product contains customer churn KPIs?\"\n",
        "}}'''\n",
        "\n",
        "print(rest_example)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Inspect the response like an operator\n",
        "\n",
        "Do not stop at a successful status code. Inspect response status, body shape, platform request IDs, your own client request ID, latency, and any source or context metadata returned by the API. This is where traceability begins."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "inspection = {\n",
        "    'responseStatus': response.status_code if 'response' in globals() and response is not None else None,\n",
        "    'clientRequestId': CLIENT_REQUEST_ID if 'CLIENT_REQUEST_ID' in globals() else None,\n",
        "    'platformRequestId': response.headers.get('x-ms-request-id') if 'response' in globals() and response is not None else None,\n",
        "    'latencyMs': latency_ms if 'latency_ms' in globals() else None,\n",
        "    'contentType': response.headers.get('Content-Type') if 'response' in globals() and response is not None else None,\n",
        "    'bodyPreview': (response.text[:500] if 'response' in globals() and response is not None and response.text else None),\n",
        "}\n",
        "\n",
        "print(json.dumps(inspection, indent=2))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Add minimal audit logging\n",
        "\n",
        "This example writes a JSON Lines audit record capturing the minimum metadata needed to answer who called the agent, which governed interface was used, and whether the request succeeded."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "audit_record = {\n",
        "    'timestampUtc': datetime.now(timezone.utc).isoformat(),\n",
        "    'workspaceId': os.getenv('FABRIC_WORKSPACE_ID', 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee'),\n",
        "    'dataAgentId': os.getenv('FABRIC_DATA_AGENT_ID', 'ffffffff-1111-2222-3333-444444444444'),\n",
        "    'clientRequestId': CLIENT_REQUEST_ID if 'CLIENT_REQUEST_ID' in globals() else '8d7d4f6d-2f0d-4f0a-a4f2-7d6d7f9f0a11',\n",
        "    'platformRequestId': response.headers.get('x-ms-request-id') if 'response' in globals() and response is not None else None,\n",
        "    'userPromptCategory': 'sales-analytics',\n",
        "    'responseStatus': 'success' if 'response' in globals() and response is not None and 200 <= response.status_code < 300 else 'unknown',\n",
        "    'latencyMs': latency_ms if 'latency_ms' in globals() else None,\n",
        "}\n",
        "\n",
        "log_path = 'fabric_data_agent_audit.jsonl'\n",
        "with open(log_path, 'a', encoding='utf-8') as log_file:\n",
        "    log_file.write(json.dumps(audit_record) + '\\n')\n",
        "\n",
        "print(f'Audit record written to {log_path}.')\n",
        "print(json.dumps(audit_record, indent=2))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Denial-path test harness\n",
        "\n",
        "Production readiness requires testing more than happy paths. This cell defines a lightweight test matrix for personas and scenarios such as unauthorized access, malformed requests, and scope violations. It is a planning and execution scaffold you can adapt to your environment."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "test_matrix = [\n",
        "    {\n",
        "        'persona': 'agent_owner',\n",
        "        'scenario': 'approved user can query approved governed sources',\n",
        "        'expected': 'success',\n",
        "    },\n",
        "    {\n",
        "        'persona': 'authorized_analyst',\n",
        "        'scenario': 'response returns with expected metadata and correlation IDs',\n",
        "        'expected': 'success',\n",
        "    },\n",
        "    {\n",
        "        'persona': 'unauthorized_employee',\n",
        "        'scenario': 'unauthorized user cannot access the agent',\n",
        "        'expected': 'denied',\n",
        "    },\n",
        "    {\n",
        "        'persona': 'service_identity',\n",
        "        'scenario': 'authorized app cannot query outside approved scope',\n",
        "        'expected': 'denied',\n",
        "    },\n",
        "    {\n",
        "        'persona': 'security_reviewer',\n",
        "        'scenario': 'malformed requests fail with actionable errors',\n",
        "        'expected': 'error_with_traceability',\n",
        "    },\n",
        "    {\n",
        "        'persona': 'security_reviewer',\n",
        "        'scenario': 'blocked outbound access behaves as expected',\n",
        "        'expected': 'controlled_failure',\n",
        "    },\n",
        "    {\n",
        "        'persona': 'security_reviewer',\n",
        "        'scenario': 'retries do not create duplicate operational side effects',\n",
        "        'expected': 'idempotent_or_safe',\n",
        "    },\n",
        "]\n",
        "\n",
        "print(json.dumps(test_matrix, indent=2))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Broker decision guidance\n",
        "\n",
        "Direct invocation may be enough for one consumer. When multiple downstream consumers need the agent, a broker or gateway pattern often becomes the cleaner operational model because it can standardize token handling, request validation, throttling, logging, source disclosure, fallback behavior, and user-context preservation.\n",
        "\n",
        "Sequence:\n",
        "\n",
        "Enterprise Client -> Entra ID: Request app token\n",
        "\n",
        "Entra ID -> Enterprise Client: Access token\n",
        "\n",
        "Enterprise Client -> Fabric Data Agent API: Query + x-ms-client-request-id\n",
        "\n",
        "Fabric Data Agent API -> Governance Controls: Enforce workspace/policy context\n",
        "\n",
        "Governance Controls -> Fabric Data Agent API: Approved governed access\n",
        "\n",
        "Fabric Data Agent API -> Enterprise Client: Response + request ID\n",
        "\n",
        "Enterprise Client -> Enterprise Client: Persist logs and audit metadata"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Production readiness checklist\n",
        "\n",
        "Use this checklist before rollout:\n",
        "\n",
        "- Curated Fabric sources are approved and documented.\n",
        "- Data agent scope is limited to approved assets.\n",
        "- Workspace and source permissions are validated.\n",
        "- Entra app registration is approved and least-privilege.\n",
        "- Token acquisition is tested in the target environment.\n",
        "- API path and payload are verified against current Microsoft docs.\n",
        "- Client request IDs are sent on every call.\n",
        "- Platform request IDs are captured when returned.\n",
        "- Audit metadata is written to centralized logging.\n",
        "- Unauthorized personas are denied as expected.\n",
        "- Retry, timeout, and fallback behavior are tested.\n",
        "- Outbound access assumptions are validated.\n",
        "- Promotion across dev, test, and prod is documented.\n",
        "- Ownership for support, governance, and incident response is assigned."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Summary\n",
        "\n",
        "The main shift is to treat Fabric Data Agents as governed interfaces over curated Fabric assets rather than standalone chat experiences. Once the agent is callable outside the Fabric UI, identity, scope, traceability, denial-path validation, and operational ownership become part of the product.\n",
        "\n",
        "## Next Steps\n",
        "\n",
        "1. Replace placeholder IDs and secrets with real environment-specific values.\n",
        "2. Verify the current Fabric Data Agent API path, payload, and auth scope in Microsoft Learn.\n",
        "3. Run the token acquisition and invocation cells in a non-production workspace first.\n",
        "4. Extend audit logging to your centralized logging platform.\n",
        "5. Execute the denial-path matrix with real personas before broader rollout.\n",
        "6. Decide whether direct access is sufficient or whether a broker should sit in front of the API."
      ]
    }
  ]
}