{
  "nbformat": 4,
  "nbformat_minor": 5,
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "name": "python",
      "version": "3.13.0"
    },
    "blog_metadata": {
      "topic": "Preventing Credential Oversharing in Copilot Studio Before It Becomes a Governance Incident",
      "slug": "preventing-credential-oversharing-in-copilot-studio-before-i",
      "generated_by": "LinkedIn Post Generator + Azure OpenAI",
      "generated_at": "2026-07-02T12:40:11.700Z"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Preventing Credential Oversharing in Copilot Studio Before It Becomes a Governance Incident\n",
        "\n",
        "Credential oversharing in Copilot Studio is usually not an AI prompt failure; it is a governance design failure that surfaces through prompts, connectors, copied assets, and weak promotion paths. This notebook turns the blog post into hands-on validation steps using Python so you can simulate scans, identify risky connection patterns, validate environment segmentation, and test promotion controls before they become incidents.\n",
        "\n",
        "The focus is practical: detect embedded secrets, surface over-broad connector sharing, enforce promotion gates, and summarize governance signals that should shape rollout decisions early."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "%pip install pandas"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "import re\n",
        "import json\n",
        "import shutil\n",
        "from pathlib import Path\n",
        "\n",
        "import pandas as pd"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setup sample data for validation\n",
        "\n",
        "The blog emphasizes that governance failures often show up first in exported solution files and copied artifacts. This setup cell creates local sample folders and files so the later scans can run end-to-end without external dependencies."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "from pathlib import Path\n",
        "import shutil\n",
        "\n",
        "base_dirs = [Path('exported_solution'), Path('artifacts')]\n",
        "for d in base_dirs:\n",
        "    if d.exists():\n",
        "        shutil.rmtree(d)\n",
        "    d.mkdir(parents=True, exist_ok=True)\n",
        "\n",
        "sample_solution_files = {\n",
        "    'exported_solution/agent.json': '{\"name\": \"FinanceAgent\", \"instruction\": \"Use bearer abcdefghijklmnop\", \"notes\": \"internal only\"}',\n",
        "    'exported_solution/settings.yaml': 'api_key: \"ABCDEF1234567890\"\\nendpoint: \"https://internal.contoso.local\"',\n",
        "    'exported_solution/connection.txt': 'Server=tcp:sql.contoso.local;Password=SuperSecret123;Database=Finance',\n",
        "    'exported_solution/readme.txt': 'No secrets here, just documentation.'\n",
        "}\n",
        "\n",
        "for file_path, content in sample_solution_files.items():\n",
        "    path = Path(file_path)\n",
        "    path.parent.mkdir(parents=True, exist_ok=True)\n",
        "    path.write_text(content, encoding='utf-8')\n",
        "\n",
        "artifact_files = {\n",
        "    'artifacts/prompt1.json': '{\"prompt\": \"normal content\", \"owner\": \"team-a\"}',\n",
        "    'artifacts/prompt2.json': '{\"access_token\": \"tok_123456789_secret_value\"}',\n",
        "    'artifacts/prompt3.json': '{\"jwt\": \"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.abc.def\"}'\n",
        "}\n",
        "\n",
        "for file_path, content in artifact_files.items():\n",
        "    path = Path(file_path)\n",
        "    path.parent.mkdir(parents=True, exist_ok=True)\n",
        "    path.write_text(content, encoding='utf-8')\n",
        "\n",
        "prompt_export = Path('prompt-export.json')\n",
        "prompt_export.write_text(\n",
        "    '{\"api_key\": \"REALKEY123456\", \"password\": \"P@ssw0rd!\", \"header\": \"Bearer abcdefghijklmnop\"}',\n",
        "    encoding='utf-8'\n",
        ")\n",
        "\n",
        "print('Sample files created.')\n",
        "for d in base_dirs:\n",
        "    print(f'\\nContents of {d}:')\n",
        "    for p in sorted(d.rglob('*')):\n",
        "        if p.is_file():\n",
        "            print('-', p)\n",
        "print('\\nCreated prompt-export.json for redaction demo.')"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Scan exported solution files for likely embedded secrets before promotion\n",
        "\n",
        "This example validates the minimum viable control described in the post: scan exported Power Platform solution contents before import to a higher-trust environment. The logic is intentionally blunt because false positives are cheaper than promoting an artifact that contains a credential."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "patterns = {\n",
        "    'api_key': re.compile(r\"(?i)(api[_-]?key|secret)\\s*[:=]\\s*['\\\"][A-Za-z0-9_\\-]{12,}['\\\"]\"),\n",
        "    'bearer': re.compile(r\"(?i)bearer\\s+[A-Za-z0-9\\-._~+/]+=*\"),\n",
        "    'connection_string': re.compile(r\"(?i)(password|accountkey|sharedaccesskey)\\s*=\"),\n",
        "}\n",
        "\n",
        "root = Path('exported_solution')\n",
        "findings = []\n",
        "for path in root.rglob('*'):\n",
        "    if path.is_file() and path.suffix.lower() in {'.json', '.yaml', '.yml', '.txt'}:\n",
        "        text = path.read_text(encoding='utf-8', errors='ignore')\n",
        "        for name, pattern in patterns.items():\n",
        "            if pattern.search(text):\n",
        "                findings.append({'file': str(path), 'finding': name})\n",
        "                print(f'[ALERT] {name} found in {path}')\n",
        "\n",
        "scan_df = pd.DataFrame(findings)\n",
        "scan_df if not scan_df.empty else pd.DataFrame(columns=['file', 'finding'])"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Report high-privilege or shared connection references for least-privilege redesign\n",
        "\n",
        "The blog argues that the quietest blast-radius multiplier is a single powerful connection reused by many agents. This Python version recreates the reporting logic to identify risky combinations of high privilege, broad sharing, and weak authentication methods."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "connection_refs = pd.DataFrame([\n",
        "    {'Name': 'Dataverse-App', 'Connector': 'Dataverse', 'Owner': 'svc-prod', 'AuthType': 'ServicePrincipal', 'SharedWith': 12, 'Privilege': 'High'},\n",
        "    {'Name': 'Graph-User', 'Connector': 'Office365Users', 'Owner': 'alice@contoso.com', 'AuthType': 'Delegated', 'SharedWith': 0, 'Privilege': 'Medium'},\n",
        "    {'Name': 'SQL-Shared', 'Connector': 'SQL', 'Owner': 'svc-shared', 'AuthType': 'UsernamePassword', 'SharedWith': 25, 'Privilege': 'High'}\n",
        "])\n",
        "\n",
        "risky = connection_refs[\n",
        "    (connection_refs['SharedWith'] > 5) |\n",
        "    (connection_refs['Privilege'].eq('High')) |\n",
        "    (connection_refs['AuthType'].eq('UsernamePassword'))\n",
        "].copy()\n",
        "\n",
        "privilege_order = {'High': 2, 'Medium': 1, 'Low': 0}\n",
        "risky['PrivilegeRank'] = risky['Privilege'].map(privilege_order)\n",
        "risky = risky.sort_values(['PrivilegeRank', 'SharedWith'], ascending=[False, False])\n",
        "result = risky[['Name', 'Connector', 'Owner', 'AuthType', 'SharedWith', 'Privilege']].reset_index(drop=True)\n",
        "result"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Inventory environments to support segmentation and governance review\n",
        "\n",
        "Environment segmentation is one of the first guardrails recommended in the post. This example checks whether each environment has an explicit purpose, region, type, and DLP posture so governance is not left fuzzy."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "envs = pd.DataFrame([\n",
        "    {'Name': 'Dev', 'Region': 'US', 'Type': 'Sandbox', 'DlpPolicy': 'Makers-Low'},\n",
        "    {'Name': 'Test', 'Region': 'US', 'Type': 'Sandbox', 'DlpPolicy': 'Restricted'},\n",
        "    {'Name': 'Prod', 'Region': 'US', 'Type': 'Production', 'DlpPolicy': 'Strict'}\n",
        "])\n",
        "\n",
        "sorted_envs = envs.sort_values(['Type', 'Name']).reset_index(drop=True)\n",
        "print('Environment inventory:')\n",
        "display(sorted_envs)\n",
        "\n",
        "print('Counts by environment type:')\n",
        "counts = envs.groupby('Type').size().reset_index(name='Count')\n",
        "display(counts)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Fail a pipeline when prompt artifacts contain token-like values or hardcoded credentials\n",
        "\n",
        "This example turns the blog’s governance pattern into a validation gate. Instead of actually terminating the notebook kernel with `sys.exit`, the code returns a pass/fail status so you can inspect the findings interactively."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "token_patterns = [\n",
        "    re.compile(r\"\\beyJ[A-Za-z0-9_-]+\\.[A-Za-z0-9._-]+\\.[A-Za-z0-9._-]+\\b\"),\n",
        "    re.compile(r\"(?i)(client_secret|access_token|refresh_token)\\s*[:=]\\s*['\\\"][^'\\\"]+['\\\"]\"),\n",
        "]\n",
        "\n",
        "hits = []\n",
        "for file in Path('artifacts').rglob('*.json'):\n",
        "    content = file.read_text(encoding='utf-8', errors='ignore')\n",
        "    if any(p.search(content) for p in token_patterns):\n",
        "        hits.append(str(file))\n",
        "\n",
        "status = 'FAIL' if hits else 'PASS'\n",
        "print('\\n'.join(hits) if hits else 'No suspicious tokens found.')\n",
        "print(f'Pipeline status: {status}')\n",
        "\n",
        "pd.DataFrame({'suspicious_artifact': hits}) if hits else pd.DataFrame(columns=['suspicious_artifact'])"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Redact suspicious values in a prompt export to create a safer review artifact\n",
        "\n",
        "The structured examples include a redaction step that is useful for governance review workflows. This cell creates a sanitized copy of a prompt export so reviewers can inspect structure and context without exposing raw secrets."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "source = Path('prompt-export.json')\n",
        "text = source.read_text(encoding='utf-8', errors='ignore')\n",
        "\n",
        "redacted = re.sub(r'(?i)(\"?(api[_-]?key|client_secret|password)\"?\\s*:\\s*\")[^\"]+(\")', r'\\1REDACTED\\3', text)\n",
        "redacted = re.sub(r'(?i)(bearer\\s+)[A-Za-z0-9\\-._~+/]+=*', r'\\1REDACTED', redacted)\n",
        "\n",
        "target = Path('prompt-export.redacted.json')\n",
        "target.write_text(redacted, encoding='utf-8')\n",
        "print(f'Created {target}')\n",
        "print(target.read_text(encoding='utf-8'))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Flag solution movement patterns that bypass expected promotion paths\n",
        "\n",
        "The post warns that copied artifacts and ALM shortcuts are where many teams get burned. This example identifies direct Dev-to-Prod moves and unapproved promotions, both of which usually indicate that security review was bypassed."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "moves = pd.DataFrame([\n",
        "    {'Solution': 'SalesCopilot', 'From': 'Dev', 'To': 'Test', 'Approved': True},\n",
        "    {'Solution': 'SalesCopilot', 'From': 'Test', 'To': 'Prod', 'Approved': True},\n",
        "    {'Solution': 'BotOps', 'From': 'Dev', 'To': 'Prod', 'Approved': False}\n",
        "])\n",
        "\n",
        "violations = moves[\n",
        "    ((moves['From'] == 'Dev') & (moves['To'] == 'Prod')) |\n",
        "    (~moves['Approved'])\n",
        "].reset_index(drop=True)\n",
        "\n",
        "violations"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Summarize connector usage to spot risky connector concentration by environment\n",
        "\n",
        "A mature governance model should make connector concentration visible, especially in low-control environments. This example highlights where risky or powerful connectors may be clustering so teams know where to focus remediation first."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "connectors = pd.DataFrame([\n",
        "    {'Environment': 'Dev', 'Connector': 'HTTP', 'Count': 8},\n",
        "    {'Environment': 'Dev', 'Connector': 'Dataverse', 'Count': 14},\n",
        "    {'Environment': 'Prod', 'Connector': 'HTTP', 'Count': 1},\n",
        "    {'Environment': 'Prod', 'Connector': 'SQL', 'Count': 9}\n",
        "])\n",
        "\n",
        "summary = connectors.sort_values(['Environment', 'Count'], ascending=[True, False]).reset_index(drop=True)\n",
        "summary"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Visualize the pre-promotion governance gate\n",
        "\n",
        "The original post includes a flow showing the right mental model: export, scan, block if secrets are found, then review and remediate. This Python cell renders the same flow as plain text so it remains notebook-friendly without requiring Mermaid support."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "flow_steps = [\n",
        "    'Maker exports solution or prompt assets',\n",
        "    'Pre-promotion scan',\n",
        "    'Decision: secrets or tokens detected?',\n",
        "    'If yes: block promotion',\n",
        "    'Open governance review',\n",
        "    'If no: promote to target environment',\n",
        "    'Post-deploy connector and connection review',\n",
        "    'Least-privilege remediation if needed'\n",
        "]\n",
        "\n",
        "for i, step in enumerate(flow_steps, start=1):\n",
        "    print(f'{i}. {step}')"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Visualize the lightweight approval workflow\n",
        "\n",
        "The post also recommends a lightweight review pattern that does not crush makers: submit, scan, review findings, and either block or allow promotion. This cell expresses that sequence in executable notebook form as a simple state trace."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "workflow = [\n",
        "    ('Maker', 'Pipeline', 'Submit solution export'),\n",
        "    ('Pipeline', 'Scanner', 'Scan prompts, flows, connection refs'),\n",
        "    ('Scanner', 'Pipeline', 'Findings report')\n",
        "]\n",
        "\n",
        "for sender, receiver, action in workflow:\n",
        "    print(f'{sender} -> {receiver}: {action}')\n",
        "\n",
        "if hits:\n",
        "    print('Pipeline -> Maker: Block promotion')\n",
        "    print('Pipeline -> Admin: Request governance review')\n",
        "else:\n",
        "    print('Pipeline -> Maker: Allow promotion')"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Governance checklist derived from the post\n",
        "\n",
        "The blog’s core argument is that oversharing is an operating-model problem. This checklist converts the narrative into concrete controls you can review during rollout."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "checklist = pd.DataFrame([\n",
        "    {'Control': 'Environment segmentation', 'Question': 'Are dev, test, and prod separated with explicit DLP posture?', 'Status': 'Review'},\n",
        "    {'Control': 'Connector approvals', 'Question': 'Do new external connectors require approval?', 'Status': 'Review'},\n",
        "    {'Control': 'Secret scanning', 'Question': 'Are prompts and exported artifacts scanned before promotion?', 'Status': 'Implemented in demo'},\n",
        "    {'Control': 'Promotion gates', 'Question': 'Can assets bypass expected ALM paths?', 'Status': 'Review'},\n",
        "    {'Control': 'Least privilege', 'Question': 'Are shared high-privilege connections rare and visible?', 'Status': 'Review'},\n",
        "    {'Control': 'Recertification', 'Question': 'Are production agents and connection references reviewed periodically?', 'Status': 'Review'}\n",
        "])\n",
        "\n",
        "checklist"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Next Steps\n",
        "\n",
        "This notebook validated the main governance patterns from the post: scanning exported artifacts for secrets, identifying risky connection references, checking environment segmentation, flagging promotion-path violations, and surfacing connector concentration. The key lesson is that credential oversharing is usually a control-plane problem involving identities, connectors, environments, and ALM discipline rather than a simple prompt-writing mistake.\n",
        "\n",
        "Next steps:\n",
        "1. Replace the sample datasets with exports from your own Copilot Studio and Power Platform inventory.\n",
        "2. Add these scans to your CI/CD or change-management workflow as mandatory promotion gates.\n",
        "3. Define approval criteria for new connectors, elevated permissions, and cross-environment movement.\n",
        "4. Align Power Platform administration, DLP, and Purview review into one operating model.\n",
        "5. Schedule periodic recertification for production agents, connection references, and exceptions."
      ]
    }
  ]
}