{
  "nbformat": 4,
  "nbformat_minor": 5,
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "name": "python",
      "version": "3.13.0"
    },
    "blog_metadata": {
      "topic": "Build an Enterprise ready 2nd Brain on Azure Foundry + Cosmos DB",
      "slug": "build-an-enterprise-ready-2nd-brain-on-azure-foundry-cosmos-",
      "generated_by": "LinkedIn Post Generator + Azure OpenAI",
      "generated_at": "2026-05-02T01:15:34.188Z"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Build an Enterprise ready 2nd Brain on Azure Foundry + Cosmos DB\n",
        "\n",
        "This notebook turns the architecture from the blog post into a hands-on validation flow. It focuses on a production-oriented pattern that combines Azure AI Foundry for chat and embeddings, Azure Cosmos DB for durable knowledge and memory, and Azure API Management for governance.\n",
        "\n",
        "The goal is not just to demo RAG, but to validate the core building blocks of an enterprise second brain: ingestion, retrieval, memory, access control, and governed API exposure."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "%pip install -q azure-identity azure-cosmos openai python-dotenv numpy"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "import os\n",
        "import time\n",
        "import uuid\n",
        "import json\n",
        "import math\n",
        "import hashlib\n",
        "from typing import List, Dict, Any\n",
        "\n",
        "import numpy as np\n",
        "from azure.identity import DefaultAzureCredential\n",
        "from azure.cosmos import CosmosClient\n",
        "from openai import AzureOpenAI"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Architecture overview\n",
        "\n",
        "The enterprise-ready pattern separates concerns across services:\n",
        "\n",
        "- Azure AI Foundry handles chat and embeddings\n",
        "- Azure Cosmos DB stores chunks, memory, preferences, and citations\n",
        "- Azure API Management acts as the governed front door\n",
        "- Azure identity and RBAC enforce access boundaries\n",
        "\n",
        "The flow is retrieval-first: ingest documents, chunk and embed them, store them with metadata and ACL tags, retrieve authorized chunks at query time, then generate grounded answers with citations."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "architecture = {\n",
        "    \"users_and_apps\": \"Azure API Management\",\n",
        "    \"api_layer\": \"Python API / Azure Functions / App Service\",\n",
        "    \"ai_control_plane\": [\"Azure AI Foundry Chat\", \"Azure AI Foundry Embeddings\"],\n",
        "    \"state_store\": \"Azure Cosmos DB\",\n",
        "    \"security\": [\"Managed Identity\", \"Key Vault\", \"Entra ID\", \"RBAC\"],\n",
        "    \"ingestion\": [\"Blob Storage or enterprise content source\", \"Chunking job\", \"Embedding generation\", \"Cosmos upsert\"]\n",
        "}\n",
        "\n",
        "print(json.dumps(architecture, indent=2))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Azure foundation provisioning baseline\n",
        "\n",
        "The original post uses PowerShell and Azure CLI to provision the resource group, VNet, Cosmos DB, and APIM. In this notebook, we keep the example executable in Python by generating the exact CLI commands you can review and run separately.\n",
        "\n",
        "This validates the infrastructure baseline without requiring the notebook kernel to execute shell provisioning automatically."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "location = \"eastus\"\n",
        "rg = \"rg-2ndbrain-demo\"\n",
        "cosmos = \"cosmos2ndbrain1234\"\n",
        "apim = \"apim-2ndbrain-demo\"\n",
        "vnet = \"vnet-2ndbrain\"\n",
        "subnet = \"snet-app\"\n",
        "\n",
        "commands = [\n",
        "    f\"az group create -n {rg} -l {location}\",\n",
        "    f\"az network vnet create -g {rg} -n {vnet} --address-prefix 10.10.0.0/16 --subnet-name {subnet} --subnet-prefix 10.10.1.0/24\",\n",
        "    f\"az cosmosdb create -g {rg} -n {cosmos} --kind GlobalDocumentDB --default-consistency-level Session --enable-free-tier true\",\n",
        "    f\"az cosmosdb sql database create -g {rg} -a {cosmos} -n brain\",\n",
        "    f\"az cosmosdb sql container create -g {rg} -a {cosmos} -d brain -n chunks --partition-key-path /tenantId --ttl -1\",\n",
        "    f\"az cosmosdb sql container create -g {rg} -a {cosmos} -d brain -n memory --partition-key-path /userId --ttl 2592000\",\n",
        "    f\"az apim create -g {rg} -n {apim} --publisher-name Contoso --publisher-email admin@contoso.com --sku-name Consumption\"\n",
        "]\n",
        "\n",
        "print(\"\\n\".join(commands))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Identity and additional Cosmos DB containers\n",
        "\n",
        "A production baseline should avoid long-lived keys and prefer Entra-backed access with managed identity. The original example also adds containers for preferences, conversations, and citations.\n",
        "\n",
        "This Python cell renders the equivalent Azure CLI commands for review."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "rg = \"rg-2ndbrain-demo\"\n",
        "cosmos = \"<your-cosmos-account>\"\n",
        "principal_id = \"<entra-object-id-or-managed-identity-principal-id>\"\n",
        "\n",
        "commands = [\n",
        "    f\"az cosmosdb sql role assignment create -g {rg} -a {cosmos} --role-definition-name 'Cosmos DB Built-in Data Contributor' --scope '/' --principal-id {principal_id}\",\n",
        "    f\"az cosmosdb sql container create -g {rg} -a {cosmos} -d brain -n preferences --partition-key-path /userId --ttl -1\",\n",
        "    f\"az cosmosdb sql container create -g {rg} -a {cosmos} -d brain -n conversations --partition-key-path /conversationId --ttl 604800\",\n",
        "    f\"az cosmosdb sql container create -g {rg} -a {cosmos} -d brain -n citations --partition-key-path /tenantId --ttl -1\"\n",
        "]\n",
        "\n",
        "print(\"\\n\".join(commands))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Data model validation\n",
        "\n",
        "Before writing ingestion or retrieval code, validate the shape of the chunk documents. The schema should include tenant, document, source, ACL, and embedding version metadata so retrieval can be filtered, explained, and reprocessed safely."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "sample_chunk = {\n",
        "    \"id\": \"contoso|handbook-001|00012\",\n",
        "    \"tenantId\": \"contoso\",\n",
        "    \"docId\": \"handbook-001\",\n",
        "    \"chunkId\": \"00012\",\n",
        "    \"text\": \"Azure Foundry helps teams build governed copilots with deployment-based model access.\",\n",
        "    \"embedding\": [0.0123, -0.0456, 0.0789],\n",
        "    \"sourceUri\": \"https://storageaccount.blob.core.windows.net/docs/handbook.pdf\",\n",
        "    \"source\": \"handbook.pdf\",\n",
        "    \"documentType\": \"policy\",\n",
        "    \"businessDomain\": \"it\",\n",
        "    \"aclTags\": [\"group:it-admins\", \"region:us\"],\n",
        "    \"classification\": \"internal\",\n",
        "    \"embeddingModel\": \"your-embedding-deployment\",\n",
        "    \"embeddingVersion\": \"2025-01\",\n",
        "    \"contentHash\": \"sha256:abc123\",\n",
        "    \"createdAt\": \"2025-05-01T12:00:00Z\",\n",
        "    \"updatedAt\": \"2025-05-01T12:00:00Z\"\n",
        "}\n",
        "\n",
        "required_fields = [\n",
        "    \"id\", \"tenantId\", \"docId\", \"chunkId\", \"text\", \"embedding\", \"sourceUri\",\n",
        "    \"source\", \"documentType\", \"businessDomain\", \"aclTags\", \"embeddingModel\",\n",
        "    \"embeddingVersion\", \"contentHash\"\n",
        "]\n",
        "\n",
        "missing = [f for f in required_fields if f not in sample_chunk]\n",
        "print(\"Missing fields:\", missing)\n",
        "print(json.dumps(sample_chunk, indent=2))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Required environment variables\n",
        "\n",
        "The next examples connect to Azure AI Foundry and Azure Cosmos DB. Set these environment variables before running live validation:\n",
        "\n",
        "- `AZURE_OPENAI_ENDPOINT`\n",
        "- `COSMOS_URI`\n",
        "- `AZURE_OPENAI_KEY` if using key-based auth for quick testing\n",
        "- `COSMOS_KEY` if using key-based auth for quick testing\n",
        "\n",
        "For production-oriented validation, prefer `DefaultAzureCredential` and managed identity instead of account keys."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Ingestion and chunking baseline\n",
        "\n",
        "This example mirrors the blog's ingestion loop: split text into chunks, generate embeddings, and upsert chunk records into Cosmos DB. The code below is self-contained and supports two modes:\n",
        "\n",
        "- `DRY_RUN=True`: simulate embeddings and print documents without calling Azure\n",
        "- `DRY_RUN=False`: call Azure AI Foundry and Cosmos DB for live validation"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "import os\n",
        "import uuid\n",
        "import hashlib\n",
        "from typing import List\n",
        "\n",
        "DRY_RUN = True\n",
        "text = \"Azure Foundry helps build enterprise copilots. Cosmos DB stores durable memory and chunks.\"\n",
        "\n",
        "def simple_chunks(text: str, size: int = 60) -> List[str]:\n",
        "    return [text[i:i+size] for i in range(0, len(text), size)]\n",
        "\n",
        "def fake_embedding(s: str, dim: int = 8) -> List[float]:\n",
        "    h = hashlib.sha256(s.encode(\"utf-8\")).digest()\n",
        "    vals = [((h[i] / 255.0) * 2 - 1) for i in range(dim)]\n",
        "    return [round(v, 6) for v in vals]\n",
        "\n",
        "chunks = simple_chunks(text, 60)\n",
        "records = []\n",
        "\n",
        "if not DRY_RUN:\n",
        "    credential = DefaultAzureCredential()\n",
        "    aoai = AzureOpenAI(\n",
        "        azure_endpoint=os.environ[\"AZURE_OPENAI_ENDPOINT\"],\n",
        "        api_version=\"2024-02-01\",\n",
        "        azure_ad_token_provider=lambda: credential.get_token(\n",
        "            \"https://cognitiveservices.azure.com/.default\"\n",
        "        ).token,\n",
        "    )\n",
        "    cosmos = CosmosClient(os.environ[\"COSMOS_URI\"], credential=credential)\n",
        "    container = cosmos.get_database_client(\"brain\").get_container_client(\"chunks\")\n",
        "\n",
        "for i, chunk in enumerate(chunks):\n",
        "    emb = fake_embedding(chunk) if DRY_RUN else aoai.embeddings.create(\n",
        "        model=\"your-embedding-deployment\",\n",
        "        input=chunk\n",
        "    ).data[0].embedding\n",
        "\n",
        "    chunk_id = f\"contoso|handbook-001|{i:05d}\"\n",
        "    doc = {\n",
        "        \"id\": chunk_id,\n",
        "        \"tenantId\": \"contoso\",\n",
        "        \"docId\": \"handbook-001\",\n",
        "        \"chunkId\": f\"{i:05d}\",\n",
        "        \"text\": chunk,\n",
        "        \"embedding\": emb,\n",
        "        \"source\": \"handbook.pdf\",\n",
        "        \"sourceUri\": \"https://storageaccount.blob.core.windows.net/docs/handbook.pdf\",\n",
        "        \"documentType\": \"policy\",\n",
        "        \"businessDomain\": \"it\",\n",
        "        \"aclTags\": [\"group:it-admins\"],\n",
        "        \"embeddingModel\": \"your-embedding-deployment\",\n",
        "        \"embeddingVersion\": \"2025-01\",\n",
        "        \"contentHash\": hashlib.sha256(chunk.encode(\"utf-8\")).hexdigest(),\n",
        "    }\n",
        "    records.append(doc)\n",
        "    if not DRY_RUN:\n",
        "        container.upsert_item(doc)\n",
        "\n",
        "print(f\"Created {len(records)} chunk records\")\n",
        "print(json.dumps(records, indent=2)[:3000])"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Required environment variables\n",
        "\n",
        "The retrieval example can run in simulated mode or against live Azure resources. For live mode, set:\n",
        "\n",
        "- `AZURE_OPENAI_ENDPOINT`\n",
        "- `AZURE_OPENAI_KEY` for key-based testing, or use Entra auth\n",
        "- `COSMOS_URI`\n",
        "- `COSMOS_KEY` for key-based testing, or use Entra auth\n",
        "\n",
        "Also replace deployment names with your Azure AI Foundry deployment names, not raw model family names."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Retrieval and grounded answer generation\n",
        "\n",
        "This example validates the retrieval-first answer path. It computes a query embedding, filters chunks by tenant and ACL, builds a grounded context window, and then generates an answer with citations.\n",
        "\n",
        "To keep the notebook runnable anywhere, the code supports a dry-run path with local sample chunks and cosine similarity."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "import os\n",
        "import math\n",
        "import hashlib\n",
        "from typing import List, Dict\n",
        "\n",
        "DRY_RUN = True\n",
        "question = \"What does our handbook say about enterprise copilots?\"\n",
        "tenant_id = \"contoso\"\n",
        "allowed_acl_tags = [\"group:it-admins\", \"region:us\"]\n",
        "\n",
        "sample_items = [\n",
        "    {\n",
        "        \"text\": \"Azure Foundry helps build enterprise copilots with deployment-based model access.\",\n",
        "        \"source\": \"handbook.pdf\",\n",
        "        \"sourceUri\": \"https://storage/docs/handbook.pdf\",\n",
        "        \"tenantId\": \"contoso\",\n",
        "        \"aclTags\": [\"group:it-admins\", \"region:us\"]\n",
        "    },\n",
        "    {\n",
        "        \"text\": \"Cosmos DB stores durable memory, chunk metadata, and operational state.\",\n",
        "        \"source\": \"architecture.pdf\",\n",
        "        \"sourceUri\": \"https://storage/docs/architecture.pdf\",\n",
        "        \"tenantId\": \"contoso\",\n",
        "        \"aclTags\": [\"group:it-admins\"]\n",
        "    },\n",
        "    {\n",
        "        \"text\": \"APIM provides JWT validation, rate limiting, and governed API exposure.\",\n",
        "        \"source\": \"platform.pdf\",\n",
        "        \"sourceUri\": \"https://storage/docs/platform.pdf\",\n",
        "        \"tenantId\": \"contoso\",\n",
        "        \"aclTags\": [\"group:platform-admins\"]\n",
        "    }\n",
        "]\n",
        "\n",
        "def fake_embedding(s: str, dim: int = 16) -> List[float]:\n",
        "    h = hashlib.sha256(s.encode(\"utf-8\")).digest()\n",
        "    vals = [((h[i] / 255.0) * 2 - 1) for i in range(dim)]\n",
        "    return vals\n",
        "\n",
        "def cosine(a: List[float], b: List[float]) -> float:\n",
        "    a = np.array(a)\n",
        "    b = np.array(b)\n",
        "    denom = np.linalg.norm(a) * np.linalg.norm(b)\n",
        "    return float(np.dot(a, b) / denom) if denom else 0.0\n",
        "\n",
        "if DRY_RUN:\n",
        "    qvec = fake_embedding(question)\n",
        "    filtered = []\n",
        "    for item in sample_items:\n",
        "        if item[\"tenantId\"] != tenant_id:\n",
        "            continue\n",
        "        if not set(item[\"aclTags\"]).intersection(set(allowed_acl_tags)):\n",
        "            continue\n",
        "        score = cosine(qvec, fake_embedding(item[\"text\"]))\n",
        "        filtered.append({**item, \"score\": score})\n",
        "    items = sorted(filtered, key=lambda x: x[\"score\"], reverse=True)[:5]\n",
        "    context = \"\\n\".join(\n",
        "        f\"[{i+1}] {x['text']} (source: {x['source']})\"\n",
        "        for i, x in enumerate(items)\n",
        "    )\n",
        "    answer = \"Azure Foundry is described as helping build enterprise copilots with deployment-based model access [1]. Cosmos DB is described as storing durable memory and chunk metadata [2].\"\n",
        "    print(\"Context:\\n\", context)\n",
        "    print(\"\\nAnswer:\\n\", answer)\n",
        "else:\n",
        "    credential = DefaultAzureCredential()\n",
        "    aoai = AzureOpenAI(\n",
        "        azure_endpoint=os.environ[\"AZURE_OPENAI_ENDPOINT\"],\n",
        "        api_version=\"2024-02-01\",\n",
        "        azure_ad_token_provider=lambda: credential.get_token(\n",
        "            \"https://cognitiveservices.azure.com/.default\"\n",
        "        ).token,\n",
        "    )\n",
        "    cosmos = CosmosClient(os.environ[\"COSMOS_URI\"], credential=credential)\n",
        "    container = cosmos.get_database_client(\"brain\").get_container_client(\"chunks\")\n",
        "\n",
        "    qvec = aoai.embeddings.create(\n",
        "        model=\"your-embedding-deployment\",\n",
        "        input=question\n",
        "    ).data[0].embedding\n",
        "\n",
        "    query = \"\"\"\n",
        "    SELECT TOP 5\n",
        "        c.text,\n",
        "        c.source,\n",
        "        c.sourceUri,\n",
        "        VectorDistance(c.embedding, @qvec) AS score\n",
        "    FROM c\n",
        "    WHERE c.tenantId = @tenantId\n",
        "    ORDER BY VectorDistance(c.embedding, @qvec)\n",
        "    \"\"\"\n",
        "\n",
        "    items = list(container.query_items(\n",
        "        query=query,\n",
        "        parameters=[\n",
        "            {\"name\": \"@qvec\", \"value\": qvec},\n",
        "            {\"name\": \"@tenantId\", \"value\": tenant_id},\n",
        "        ],\n",
        "        partition_key=tenant_id\n",
        "    ))\n",
        "\n",
        "    items = [x for x in items if set(allowed_acl_tags).intersection(set(x.get(\"aclTags\", [])))]\n",
        "    context = \"\\n\".join(\n",
        "        f\"[{i+1}] {x['text']} (source: {x['source']})\"\n",
        "        for i, x in enumerate(items)\n",
        "    )\n",
        "    messages = [\n",
        "        {\n",
        "            \"role\": \"system\",\n",
        "            \"content\": \"Answer only from the provided context. Cite sources like [1]. If evidence is insufficient, say so.\"\n",
        "        },\n",
        "        {\n",
        "            \"role\": \"user\",\n",
        "            \"content\": f\"Context:\\n{context}\\n\\nQuestion: {question}\"\n",
        "        }\n",
        "    ]\n",
        "    resp = aoai.chat.completions.create(\n",
        "        model=\"your-chat-deployment\",\n",
        "        messages=messages,\n",
        "        temperature=0\n",
        "    )\n",
        "    print(resp.choices[0].message.content)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Durable memory and preferences\n",
        "\n",
        "This example persists user preferences and episodic memory. The key design rule is selectivity: not every conversation turn should become durable memory.\n",
        "\n",
        "The code below includes a simple write rule that avoids storing obviously sensitive content and supports dry-run validation."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "import os\n",
        "import time\n",
        "import uuid\n",
        "\n",
        "DRY_RUN = True\n",
        "user_id = \"u-123\"\n",
        "conversation_id = \"conv-456\"\n",
        "\n",
        "preferences_doc = {\n",
        "    \"id\": user_id,\n",
        "    \"userId\": user_id,\n",
        "    \"tone\": \"concise\",\n",
        "    \"topics\": [\"azure\", \"cosmosdb\"]\n",
        "}\n",
        "\n",
        "message = {\n",
        "    \"id\": str(uuid.uuid4()),\n",
        "    \"userId\": user_id,\n",
        "    \"conversationId\": conversation_id,\n",
        "    \"role\": \"assistant\",\n",
        "    \"content\": \"Use grounded answers with citations.\",\n",
        "    \"memoryType\": \"episodic\",\n",
        "    \"ttl\": 86400,\n",
        "    \"createdAt\": int(time.time())\n",
        "}\n",
        "\n",
        "should_store = len(message[\"content\"]) > 20 and \"password\" not in message[\"content\"].lower()\n",
        "\n",
        "if DRY_RUN:\n",
        "    print(\"Preferences document:\")\n",
        "    print(json.dumps(preferences_doc, indent=2))\n",
        "    print(\"\\nMemory write allowed:\", should_store)\n",
        "    if should_store:\n",
        "        print(json.dumps(message, indent=2))\n",
        "else:\n",
        "    credential = DefaultAzureCredential()\n",
        "    cosmos = CosmosClient(os.environ[\"COSMOS_URI\"], credential=credential)\n",
        "    db = cosmos.get_database_client(\"brain\")\n",
        "    memory = db.get_container_client(\"memory\")\n",
        "    prefs = db.get_container_client(\"preferences\")\n",
        "\n",
        "    prefs.upsert_item(preferences_doc)\n",
        "    if should_store:\n",
        "        memory.upsert_item(message)\n",
        "    print(\"Memory and preferences persisted.\")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Required environment variables\n",
        "\n",
        "The next example reads from Cosmos DB to assemble a personalized prompt. For live mode, set:\n",
        "\n",
        "- `COSMOS_URI`\n",
        "- `COSMOS_KEY` if using key-based access\n",
        "\n",
        "For production validation, use `DefaultAzureCredential` and Cosmos DB RBAC."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Load recent memory and preferences\n",
        "\n",
        "This example shows how to build a personalized system prompt from stable preferences plus recent episodic memory. In a production system, older turns should usually be summarized rather than replayed verbatim."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "import os\n",
        "\n",
        "DRY_RUN = True\n",
        "user_id = \"u-123\"\n",
        "\n",
        "if DRY_RUN:\n",
        "    pref = {\"id\": user_id, \"userId\": user_id, \"tone\": \"concise\", \"topics\": [\"azure\", \"cosmosdb\"]}\n",
        "    recent = [\n",
        "        {\"role\": \"assistant\", \"content\": \"Use grounded answers with citations.\"},\n",
        "        {\"role\": \"user\", \"content\": \"Help me design an enterprise second brain.\"}\n",
        "    ]\n",
        "else:\n",
        "    credential = DefaultAzureCredential()\n",
        "    cosmos = CosmosClient(os.environ[\"COSMOS_URI\"], credential=credential)\n",
        "    db = cosmos.get_database_client(\"brain\")\n",
        "    memory = db.get_container_client(\"memory\")\n",
        "    prefs = db.get_container_client(\"preferences\")\n",
        "    pref = prefs.read_item(item=user_id, partition_key=user_id)\n",
        "    recent = list(memory.query_items(\n",
        "        query=\"SELECT TOP 5 c.role, c.content FROM c WHERE c.userId=@u ORDER BY c.createdAt DESC\",\n",
        "        parameters=[{\"name\": \"@u\", \"value\": user_id}],\n",
        "        partition_key=user_id\n",
        "    ))\n",
        "\n",
        "system_prompt = f\"User prefers a {pref['tone']} tone and cares about {', '.join(pref['topics'])}.\"\n",
        "history = \"\\n\".join(f\"{m['role']}: {m['content']}\" for m in recent)\n",
        "print(system_prompt + \"\\n\" + history)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## APIM governance baseline\n",
        "\n",
        "The blog uses PowerShell to publish the backend through Azure API Management with JWT validation and rate limiting. This Python cell generates the equivalent CLI commands and policy XML so you can validate the governance baseline before deployment."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "rg = \"rg-2ndbrain-demo\"\n",
        "apim = \"apim-2ndbrain-demo\"\n",
        "api_id = \"brain-api\"\n",
        "backend_url = \"https://2ndbrain-api.azurewebsites.net\"\n",
        "\n",
        "auth_policy = f'''<policies>\n",
        "  <inbound>\n",
        "    <base />\n",
        "    <validate-jwt header-name=\"Authorization\" require-scheme=\"Bearer\" failed-validation-httpcode=\"401\" failed-validation-error-message=\"Unauthorized\">\n",
        "      <openid-config url=\"https://login.microsoftonline.com/<tenant-id>/v2.0/.well-known/openid-configuration\" />\n",
        "      <audiences>\n",
        "        <audience>api://second-brain-api</audience>\n",
        "      </audiences>\n",
        "      <issuers>\n",
        "        <issuer>https://login.microsoftonline.com/<tenant-id>/v2.0</issuer>\n",
        "      </issuers>\n",
        "      <required-claims>\n",
        "        <claim name=\"scp\" match=\"any\">\n",
        "          <value>SecondBrain.Read</value>\n",
        "          <value>SecondBrain.Write</value>\n",
        "        </claim>\n",
        "      </required-claims>\n",
        "    </validate-jwt>\n",
        "    <rate-limit-by-key calls=\"30\" renewal-period=\"60\" counter-key=\"@(context.Request.IpAddress)\" />\n",
        "    <set-header name=\"x-correlation-id\" exists-action=\"override\">\n",
        "      <value>@(context.RequestId.ToString())</value>\n",
        "    </set-header>\n",
        "    <set-backend-service base-url=\"{backend_url}\" />\n",
        "  </inbound>\n",
        "  <backend>\n",
        "    <base />\n",
        "  </backend>\n",
        "  <outbound>\n",
        "    <base />\n",
        "  </outbound>\n",
        "  <on-error>\n",
        "    <base />\n",
        "  </on-error>\n",
        "</policies>'''\n",
        "\n",
        "commands = [\n",
        "    f\"az apim api create -g {rg} --service-name {apim} --api-id {api_id} --path brain --display-name '2nd Brain API' --protocols https --service-url {backend_url}\",\n",
        "    f\"az apim api policy create -g {rg} --service-name {apim} --api-id {api_id} --xml-content @policy.xml\"\n",
        "]\n",
        "\n",
        "print(\"APIM commands:\\n\")\n",
        "print(\"\\n\".join(commands))\n",
        "print(\"\\nPolicy XML:\\n\")\n",
        "print(auth_policy)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Runtime request path\n",
        "\n",
        "At runtime, APIM should validate the caller before the backend spends tokens or RU. The backend then loads preferences and memory, retrieves authorized chunks, calls the chat deployment with grounded context, and returns an answer with citations."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "runtime_sequence = [\n",
        "    \"User -> APIM: POST /brain/ask\",\n",
        "    \"APIM -> API: validate JWT, apply rate limit, forward request\",\n",
        "    \"API -> Cosmos DB: load preferences and recent memory\",\n",
        "    \"API -> Foundry: create query embedding\",\n",
        "    \"API -> Cosmos DB: retrieve tenant- and ACL-filtered chunks\",\n",
        "    \"API -> Foundry: generate grounded answer with context\",\n",
        "    \"API -> APIM: return JSON response\",\n",
        "    \"APIM -> User: 200 OK\"\n",
        "]\n",
        "\n",
        "for step in runtime_sequence:\n",
        "    print(step)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Cost, throughput, and reliability checklist\n",
        "\n",
        "An enterprise second brain needs measurable controls. This cell turns the blog guidance into a compact validation checklist you can adapt into operational reviews."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "checklist = {\n",
        "    \"token_cost_controls\": [\n",
        "        \"Cap retrieved chunks\",\n",
        "        \"Use prompt budgets\",\n",
        "        \"Route simple Q&A to lower-cost chat deployment\",\n",
        "        \"Summarize long histories instead of replaying them\",\n",
        "        \"Avoid unnecessary re-embedding\"\n",
        "    ],\n",
        "    \"cosmos_ru_controls\": [\n",
        "        \"Choose partition keys aligned to access patterns\",\n",
        "        \"Avoid one giant container\",\n",
        "        \"Exclude non-queryable fields from indexing where appropriate\",\n",
        "        \"Keep chunk documents compact\",\n",
        "        \"Use TTL for short-lived memory\"\n",
        "    ],\n",
        "    \"reliability_targets\": [\n",
        "        \"Ingestion latency\",\n",
        "        \"Retrieval latency\",\n",
        "        \"Answer latency\",\n",
        "        \"Groundedness\",\n",
        "        \"Cost per query\",\n",
        "        \"Ingestion failure rate\"\n",
        "    ]\n",
        "}\n",
        "\n",
        "print(json.dumps(checklist, indent=2))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Evaluation harness starter\n",
        "\n",
        "A production-ready second brain should be replay-tested when you change chunking, embeddings, prompts, or retrieval logic. This simple evaluation scaffold lets you define expected evidence and compare outputs over time."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "evaluation_set = [\n",
        "    {\n",
        "        \"question\": \"What does our handbook say about enterprise copilots?\",\n",
        "        \"expected_sources\": [\"handbook.pdf\"],\n",
        "        \"must_include\": [\"enterprise copilots\", \"deployment-based model access\"]\n",
        "    },\n",
        "    {\n",
        "        \"question\": \"Where is durable memory stored?\",\n",
        "        \"expected_sources\": [\"architecture.pdf\"],\n",
        "        \"must_include\": [\"Cosmos DB\", \"durable memory\"]\n",
        "    }\n",
        "]\n",
        "\n",
        "def score_answer(answer: str, expected: dict) -> dict:\n",
        "    answer_l = answer.lower()\n",
        "    hit_terms = [term for term in expected[\"must_include\"] if term.lower() in answer_l]\n",
        "    return {\n",
        "        \"question\": expected[\"question\"],\n",
        "        \"term_recall\": len(hit_terms) / max(len(expected[\"must_include\"]), 1),\n",
        "        \"matched_terms\": hit_terms\n",
        "    }\n",
        "\n",
        "sample_answer = \"Azure Foundry helps build enterprise copilots with deployment-based model access. Cosmos DB stores durable memory.\"\n",
        "results = [score_answer(sample_answer, item) for item in evaluation_set]\n",
        "print(json.dumps(results, indent=2))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Summary\n",
        "\n",
        "This notebook validated the main building blocks of an enterprise-ready second brain on Azure:\n",
        "\n",
        "- a retrieval-first architecture using Azure AI Foundry and Cosmos DB\n",
        "- a chunk schema with tenant, ACL, and embedding version metadata\n",
        "- ingestion and chunking patterns for durable knowledge storage\n",
        "- grounded retrieval and answer generation with citations\n",
        "- selective durable memory and user preferences\n",
        "- APIM as the governed enterprise front door\n",
        "- cost, reliability, and evaluation checklists for production hardening\n",
        "\n",
        "## Next Steps\n",
        "\n",
        "- Replace dry-run paths with live Azure resources and deployment names\n",
        "- Add FastAPI endpoints for `/ingest`, `/ask`, and `/memory`\n",
        "- Implement stricter ACL intersection logic in retrieval queries\n",
        "- Add Cosmos DB indexing policy tuning and vector search validation\n",
        "- Publish the backend through APIM with tenant-aware quotas and logging\n",
        "- Build a replayable evaluation suite for groundedness, citation quality, and ACL correctness"
      ]
    }
  ]
}