{
  "nbformat": 4,
  "nbformat_minor": 5,
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "name": "python",
      "version": "3.13.0"
    },
    "blog_metadata": {
      "topic": "Token efficiency as the new FinOps metric for GitHub and Copilot agent workflows",
      "slug": "token-efficiency-as-the-new-finops-metric-for-github-and-cop",
      "generated_by": "LinkedIn Post Generator + Azure OpenAI",
      "generated_at": "2026-05-08T19:14:57.187Z"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Token efficiency as the new FinOps metric for GitHub and Copilot agent workflows\n",
        "\n",
        "Seat counts tell you who has access to AI, but they do not explain whether AI-assisted delivery is economically efficient. This notebook turns the blog post into hands-on validation steps using Python so you can model telemetry, compute token-efficiency KPIs, detect waste patterns, and compare repositories or teams with simple governance-oriented heuristics."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "%pip install -q pandas matplotlib seaborn"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "import math\n",
        "from collections import defaultdict\n",
        "\n",
        "import pandas as pd\n",
        "import matplotlib.pyplot as plt\n",
        "import seaborn as sns\n",
        "\n",
        "sns.set_theme(style='whitegrid')"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Optimization loop architecture\n",
        "\n",
        "The blog frames AI FinOps as an optimization loop rather than a procurement dashboard. This cell renders the workflow as a simple directed graph using Python so the telemetry-to-showback path is explicit and easy to validate."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "nodes = [\n",
        "    'GitHub Actions / Copilot Agents',\n",
        "    'Workflow Telemetry Export',\n",
        "    'Normalize Events',\n",
        "    'Token Efficiency KPIs',\n",
        "    'Showback by Team / Repo',\n",
        "    'Optimization Loop',\n",
        "    'Prompt hygiene',\n",
        "    'Retry reduction',\n",
        "    'Quality gates',\n",
        "    'FinOps dashboard'\n",
        "]\n",
        "\n",
        "edges = [\n",
        "    ('GitHub Actions / Copilot Agents', 'Workflow Telemetry Export'),\n",
        "    ('Workflow Telemetry Export', 'Normalize Events'),\n",
        "    ('Normalize Events', 'Token Efficiency KPIs'),\n",
        "    ('Token Efficiency KPIs', 'Showback by Team / Repo'),\n",
        "    ('Token Efficiency KPIs', 'Optimization Loop'),\n",
        "    ('Optimization Loop', 'Prompt hygiene'),\n",
        "    ('Optimization Loop', 'Retry reduction'),\n",
        "    ('Optimization Loop', 'Quality gates'),\n",
        "    ('Showback by Team / Repo', 'FinOps dashboard'),\n",
        "]\n",
        "\n",
        "print('Nodes:')\n",
        "for n in nodes:\n",
        "    print('-', n)\n",
        "\n",
        "print('\\nEdges:')\n",
        "for a, b in edges:\n",
        "    print(f'{a} -> {b}')"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Compute token-efficiency KPIs per repository\n",
        "\n",
        "This example reproduces the blog's lightweight KPI calculation. It aggregates tokens, tasks, accepted PRs, retries, cost, and a simple quality-adjusted cost heuristic so you can compare repositories by useful outcome rather than by seat count."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "# Load exported workflow telemetry and compute token efficiency KPIs per repository.\n",
        "from collections import defaultdict\n",
        "\n",
        "rows = [\n",
        "    {'repo': 'api', 'task_completed': '1', 'pr_accepted': '1', 'retries': '0', 'tokens': '3200', 'cost': '0.64', 'quality': '0.92'},\n",
        "    {'repo': 'api', 'task_completed': '1', 'pr_accepted': '0', 'retries': '1', 'tokens': '4100', 'cost': '0.82', 'quality': '0.70'},\n",
        "    {'repo': 'web', 'task_completed': '1', 'pr_accepted': '1', 'retries': '0', 'tokens': '2100', 'cost': '0.42', 'quality': '0.95'},\n",
        "]\n",
        "\n",
        "kpis = defaultdict(lambda: {'tokens': 0, 'tasks': 0, 'accepted_prs': 0, 'retries': 0, 'cost': 0.0, 'quality_cost': 0.0})\n",
        "for r in rows:\n",
        "    repo = r['repo']\n",
        "    tokens = int(r['tokens'])\n",
        "    cost = float(r['cost'])\n",
        "    quality = float(r['quality'])\n",
        "    kpis[repo]['tokens'] += tokens\n",
        "    kpis[repo]['tasks'] += int(r['task_completed'])\n",
        "    kpis[repo]['accepted_prs'] += int(r['pr_accepted'])\n",
        "    kpis[repo]['retries'] += int(r['retries'])\n",
        "    kpis[repo]['cost'] += cost\n",
        "    kpis[repo]['quality_cost'] += cost / max(quality, 0.01)\n",
        "\n",
        "repo_kpi_rows = []\n",
        "for repo, v in kpis.items():\n",
        "    repo_kpi_rows.append({\n",
        "        'repo': repo,\n",
        "        'tokens_per_completed_task': round(v['tokens'] / max(v['tasks'], 1), 2),\n",
        "        'tokens_per_accepted_pr': round(v['tokens'] / max(v['accepted_prs'], 1), 2),\n",
        "        'retry_rate': round(v['retries'] / max(v['tasks'], 1), 2),\n",
        "        'quality_adjusted_cost': round(v['quality_cost'], 2),\n",
        "        'total_cost': round(v['cost'], 2),\n",
        "        'accepted_prs': v['accepted_prs'],\n",
        "        'tasks': v['tasks']\n",
        "    })\n",
        "\n",
        "repo_kpi_df = pd.DataFrame(repo_kpi_rows).sort_values('repo').reset_index(drop=True)\n",
        "repo_kpi_df"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Visualize repository KPI differences\n",
        "\n",
        "The blog argues that workflow design, not license allocation, explains much of the variance in AI spend. This chart makes that visible by plotting token efficiency and retry behavior side by side for each repository."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "fig, axes = plt.subplots(1, 2, figsize=(12, 4))\n",
        "\n",
        "sns.barplot(data=repo_kpi_df, x='repo', y='tokens_per_completed_task', ax=axes[0], palette='Blues_d')\n",
        "axes[0].set_title('Tokens per Completed Task')\n",
        "axes[0].set_xlabel('Repository')\n",
        "axes[0].set_ylabel('Tokens')\n",
        "\n",
        "sns.barplot(data=repo_kpi_df, x='repo', y='retry_rate', ax=axes[1], palette='Reds_d')\n",
        "axes[1].set_title('Retry Rate')\n",
        "axes[1].set_xlabel('Repository')\n",
        "axes[1].set_ylabel('Retries per Task')\n",
        "\n",
        "plt.tight_layout()\n",
        "plt.show()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Normalize raw telemetry into a compact schema\n",
        "\n",
        "Before dashboards or scorecards, the blog recommends defining a minimum event schema. This example converts raw workflow events into a normalized structure with repo, team, tokens, task outcome, PR outcome, and retry count."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "# Normalize raw telemetry into a compact schema for downstream KPI calculations.\n",
        "raw_events = [\n",
        "    {'repository': 'api', 'team_slug': 'platform', 'input_tokens': 1200, 'output_tokens': 2000, 'status': 'completed', 'pr_state': 'merged', 'attempt': 1},\n",
        "    {'repository': 'api', 'team_slug': 'platform', 'input_tokens': 1500, 'output_tokens': 2600, 'status': 'completed', 'pr_state': 'closed', 'attempt': 2},\n",
        "]\n",
        "\n",
        "normalized = []\n",
        "for e in raw_events:\n",
        "    normalized.append({\n",
        "        'repo': e['repository'],\n",
        "        'team': e['team_slug'],\n",
        "        'tokens': e['input_tokens'] + e['output_tokens'],\n",
        "        'task_completed': 1 if e['status'] == 'completed' else 0,\n",
        "        'pr_accepted': 1 if e['pr_state'] == 'merged' else 0,\n",
        "        'retries': max(e['attempt'] - 1, 0),\n",
        "    })\n",
        "\n",
        "normalized_df = pd.DataFrame(normalized)\n",
        "normalized_df"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Validate the normalized schema with richer sample data\n",
        "\n",
        "To make the normalization step more useful for experimentation, this cell extends the sample with multiple teams and repositories. It then recomputes downstream KPIs from the normalized records to show how a compact schema supports governance analysis."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "raw_events_extended = [\n",
        "    {'repository': 'api', 'team_slug': 'platform', 'input_tokens': 1200, 'output_tokens': 2000, 'status': 'completed', 'pr_state': 'merged', 'attempt': 1},\n",
        "    {'repository': 'api', 'team_slug': 'platform', 'input_tokens': 1500, 'output_tokens': 2600, 'status': 'completed', 'pr_state': 'closed', 'attempt': 2},\n",
        "    {'repository': 'web', 'team_slug': 'growth', 'input_tokens': 900, 'output_tokens': 1200, 'status': 'completed', 'pr_state': 'merged', 'attempt': 1},\n",
        "    {'repository': 'web', 'team_slug': 'growth', 'input_tokens': 1100, 'output_tokens': 1500, 'status': 'completed', 'pr_state': 'merged', 'attempt': 1},\n",
        "    {'repository': 'data', 'team_slug': 'analytics', 'input_tokens': 1800, 'output_tokens': 2200, 'status': 'completed', 'pr_state': 'closed', 'attempt': 3},\n",
        "]\n",
        "\n",
        "normalized_extended = []\n",
        "for e in raw_events_extended:\n",
        "    normalized_extended.append({\n",
        "        'repo': e['repository'],\n",
        "        'team': e['team_slug'],\n",
        "        'tokens': e['input_tokens'] + e['output_tokens'],\n",
        "        'task_completed': 1 if e['status'] == 'completed' else 0,\n",
        "        'pr_accepted': 1 if e['pr_state'] == 'merged' else 0,\n",
        "        'retries': max(e['attempt'] - 1, 0),\n",
        "    })\n",
        "\n",
        "normalized_extended_df = pd.DataFrame(normalized_extended)\n",
        "summary_df = (\n",
        "    normalized_extended_df\n",
        "    .groupby(['team', 'repo'], as_index=False)\n",
        "    .agg(tokens=('tokens', 'sum'),\n",
        "         tasks=('task_completed', 'sum'),\n",
        "         accepted_prs=('pr_accepted', 'sum'),\n",
        "         retries=('retries', 'sum'))\n",
        ")\n",
        "summary_df['tokens_per_task'] = (summary_df['tokens'] / summary_df['tasks'].clip(lower=1)).round(2)\n",
        "summary_df['tokens_per_accepted_pr'] = (summary_df['tokens'] / summary_df['accepted_prs'].clip(lower=1)).round(2)\n",
        "summary_df['retry_rate'] = (summary_df['retries'] / summary_df['tasks'].clip(lower=1)).round(2)\n",
        "summary_df.sort_values(['team', 'repo']).reset_index(drop=True)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Detect inefficient agent sessions\n",
        "\n",
        "The blog suggests that high-token sessions with multiple retries and no accepted PR often indicate prompt or context problems. This example classifies sessions with simple heuristics to surface optimization candidates."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "# Detect inefficient agent sessions by flagging high-token retries and low acceptance outcomes.\n",
        "sessions = [\n",
        "    {'session_id': 's1', 'tokens': 1800, 'retries': 0, 'accepted_pr': 1},\n",
        "    {'session_id': 's2', 'tokens': 5200, 'retries': 3, 'accepted_pr': 0},\n",
        "    {'session_id': 's3', 'tokens': 4700, 'retries': 2, 'accepted_pr': 1},\n",
        "]\n",
        "\n",
        "def classify(session):\n",
        "    if session['tokens'] > 5000 and session['retries'] >= 2:\n",
        "        return 'prompt_or_context_issue'\n",
        "    if session['accepted_pr'] == 0 and session['tokens'] > 3000:\n",
        "        return 'low_quality_spend'\n",
        "    return 'healthy'\n",
        "\n",
        "session_results = []\n",
        "for s in sessions:\n",
        "    session_results.append({'session_id': s['session_id'], 'classification': classify(s)})\n",
        "\n",
        "pd.DataFrame(session_results)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Explore anomaly rules on a larger session sample\n",
        "\n",
        "Simple rules are not a mature governance model, but they are useful for triage. This cell expands the session sample, applies the same classification logic, and summarizes how many sessions fall into each category."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "sessions_extended = [\n",
        "    {'session_id': 's1', 'tokens': 1800, 'retries': 0, 'accepted_pr': 1},\n",
        "    {'session_id': 's2', 'tokens': 5200, 'retries': 3, 'accepted_pr': 0},\n",
        "    {'session_id': 's3', 'tokens': 4700, 'retries': 2, 'accepted_pr': 1},\n",
        "    {'session_id': 's4', 'tokens': 3400, 'retries': 1, 'accepted_pr': 0},\n",
        "    {'session_id': 's5', 'tokens': 2600, 'retries': 0, 'accepted_pr': 1},\n",
        "    {'session_id': 's6', 'tokens': 6100, 'retries': 4, 'accepted_pr': 0},\n",
        "]\n",
        "\n",
        "sessions_extended_df = pd.DataFrame(sessions_extended)\n",
        "sessions_extended_df['classification'] = sessions_extended_df.apply(classify, axis=1)\n",
        "sessions_extended_df"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "classification_counts = sessions_extended_df['classification'].value_counts().rename_axis('classification').reset_index(name='count')\n",
        "classification_counts"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "plt.figure(figsize=(7, 4))\n",
        "sns.barplot(data=classification_counts, x='classification', y='count', palette='viridis')\n",
        "plt.title('Session Classification Counts')\n",
        "plt.xlabel('Classification')\n",
        "plt.ylabel('Count')\n",
        "plt.xticks(rotation=15)\n",
        "plt.tight_layout()\n",
        "plt.show()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Score token efficiency with a weighted index\n",
        "\n",
        "The blog proposes a directional scoring model that blends token efficiency, retry behavior, and quality-adjusted cost. This is not a standardized benchmark, but it is useful for comparing teams or repositories internally."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "# Score token efficiency with a weighted index to compare teams or repositories.\n",
        "def efficiency_score(tokens_per_task, retry_rate, quality_adjusted_cost):\n",
        "    token_component = max(0.0, 100 - (tokens_per_task / 100))\n",
        "    retry_component = max(0.0, 100 - (retry_rate * 100))\n",
        "    cost_component = max(0.0, 100 - (quality_adjusted_cost * 20))\n",
        "    return round((0.5 * token_component) + (0.2 * retry_component) + (0.3 * cost_component), 2)\n",
        "\n",
        "samples = [\n",
        "    {'name': 'team-api', 'tokens_per_task': 3650, 'retry_rate': 0.18, 'quality_adjusted_cost': 1.05},\n",
        "    {'name': 'team-web', 'tokens_per_task': 2200, 'retry_rate': 0.05, 'quality_adjusted_cost': 0.44},\n",
        "]\n",
        "\n",
        "score_rows = []\n",
        "for s in samples:\n",
        "    score_rows.append({\n",
        "        'name': s['name'],\n",
        "        'efficiency_score': efficiency_score(**{k: s[k] for k in ('tokens_per_task', 'retry_rate', 'quality_adjusted_cost')})\n",
        "    })\n",
        "\n",
        "pd.DataFrame(score_rows)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Sensitivity analysis for the weighted score\n",
        "\n",
        "Because the score is heuristic, it helps to see how it behaves across a range of token, retry, and cost values. This cell generates a small scenario grid so you can inspect how the index rewards or penalizes different workflow patterns."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "scenario_rows = []\n",
        "for tokens_per_task in [1500, 2500, 3500, 4500]:\n",
        "    for retry_rate in [0.0, 0.1, 0.25, 0.5]:\n",
        "        for quality_adjusted_cost in [0.4, 0.8, 1.2]:\n",
        "            scenario_rows.append({\n",
        "                'tokens_per_task': tokens_per_task,\n",
        "                'retry_rate': retry_rate,\n",
        "                'quality_adjusted_cost': quality_adjusted_cost,\n",
        "                'efficiency_score': efficiency_score(tokens_per_task, retry_rate, quality_adjusted_cost)\n",
        "            })\n",
        "\n",
        "scenario_df = pd.DataFrame(scenario_rows)\n",
        "scenario_df.head(12)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "pivot = scenario_df[scenario_df['quality_adjusted_cost'] == 0.8].pivot(index='tokens_per_task', columns='retry_rate', values='efficiency_score')\n",
        "plt.figure(figsize=(7, 5))\n",
        "sns.heatmap(pivot, annot=True, fmt='.1f', cmap='YlGnBu')\n",
        "plt.title('Efficiency Score Heatmap at Quality-Adjusted Cost = 0.8')\n",
        "plt.xlabel('Retry Rate')\n",
        "plt.ylabel('Tokens per Task')\n",
        "plt.tight_layout()\n",
        "plt.show()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Build a showback-style report by team and repository\n",
        "\n",
        "The blog recommends exposing showback by team or repository. This Python version of the reporting pattern aggregates telemetry into a governance-friendly table with token efficiency, retry rate, and total cost."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "events = [\n",
        "    {'Team': 'Platform', 'Repo': 'api', 'Tokens': 3200, 'Cost': 0.64, 'CompletedTask': 1, 'AcceptedPR': 1, 'Retries': 0},\n",
        "    {'Team': 'Platform', 'Repo': 'api', 'Tokens': 4100, 'Cost': 0.82, 'CompletedTask': 1, 'AcceptedPR': 0, 'Retries': 1},\n",
        "    {'Team': 'Growth', 'Repo': 'web', 'Tokens': 2100, 'Cost': 0.42, 'CompletedTask': 1, 'AcceptedPR': 1, 'Retries': 0},\n",
        "    {'Team': 'Analytics', 'Repo': 'data', 'Tokens': 3900, 'Cost': 0.91, 'CompletedTask': 1, 'AcceptedPR': 0, 'Retries': 2},\n",
        "]\n",
        "\n",
        "events_df = pd.DataFrame(events)\n",
        "showback_df = (\n",
        "    events_df\n",
        "    .groupby(['Team', 'Repo'], as_index=False)\n",
        "    .agg(Tokens=('Tokens', 'sum'),\n",
        "         TotalCost=('Cost', 'sum'),\n",
        "         Tasks=('CompletedTask', 'sum'),\n",
        "         AcceptedPRs=('AcceptedPR', 'sum'),\n",
        "         Retries=('Retries', 'sum'))\n",
        ")\n",
        "showback_df['TokensPerTask'] = (showback_df['Tokens'] / showback_df['Tasks'].clip(lower=1)).round(2)\n",
        "showback_df['TokensPerAcceptedPR'] = (showback_df['Tokens'] / showback_df['AcceptedPRs'].clip(lower=1)).round(2)\n",
        "showback_df['RetryRate'] = (showback_df['Retries'] / showback_df['Tasks'].clip(lower=1)).round(2)\n",
        "showback_df = showback_df[['Team', 'Repo', 'TokensPerTask', 'TokensPerAcceptedPR', 'RetryRate', 'TotalCost']]\n",
        "showback_df.sort_values(['Team', 'Repo']).reset_index(drop=True)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Add budget variance and anomaly flags\n",
        "\n",
        "A useful showback report often includes budget context and simple anomaly flags. This cell adds per-repository budgets, computes variance, and flags rows where token intensity or retry rate exceeds a threshold."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "budget_per_repo = {'api': 1.00, 'web': 0.75, 'data': 0.80}\n",
        "\n",
        "governance_df = showback_df.copy()\n",
        "governance_df['Budget'] = governance_df['Repo'].map(budget_per_repo)\n",
        "governance_df['Variance'] = (governance_df['TotalCost'] - governance_df['Budget']).round(2)\n",
        "governance_df['Anomaly'] = (governance_df['TokensPerTask'] > 3000) | (governance_df['RetryRate'] > 0.25)\n",
        "governance_df.sort_values(['Anomaly', 'Variance'], ascending=[False, False]).reset_index(drop=True)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Sequence of telemetry capture and feedback\n",
        "\n",
        "The blog also describes a sequence from developer request to workflow execution, telemetry storage, aggregation, and dashboard feedback. This cell expresses that sequence as ordered steps so the operating model is easy to inspect in notebook form."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "sequence_steps = [\n",
        "    ('Developer', 'Copilot Agent', 'Prompt / task request'),\n",
        "    ('Copilot Agent', 'GitHub Workflow', 'Create branch, run workflow'),\n",
        "    ('GitHub Workflow', 'Telemetry Store', 'Emit tokens, retries, outcome, cost'),\n",
        "    ('Copilot Agent', 'GitHub Workflow', 'Open PR'),\n",
        "    ('GitHub Workflow', 'Telemetry Store', 'Emit PR accepted/rejected signal'),\n",
        "    ('Telemetry Store', 'FinOps Dashboard', 'Aggregate by repo/team'),\n",
        "    ('FinOps Dashboard', 'Developer', 'Token efficiency insights'),\n",
        "]\n",
        "\n",
        "sequence_df = pd.DataFrame(sequence_steps, columns=['From', 'To', 'Action'])\n",
        "sequence_df.index = sequence_df.index + 1\n",
        "sequence_df"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## End-to-end anomaly pipeline\n",
        "\n",
        "This final validation example mirrors the blog's flow from raw logs to normalized schema, KPI computation, threshold checks, and optimization backlog. It demonstrates how a small rules engine can turn telemetry into actionable governance signals."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "raw_logs = pd.DataFrame([\n",
        "    {'repo': 'api', 'team': 'platform', 'tokens': 3200, 'accepted_pr': 1, 'retries': 0},\n",
        "    {'repo': 'api', 'team': 'platform', 'tokens': 4100, 'accepted_pr': 0, 'retries': 1},\n",
        "    {'repo': 'web', 'team': 'growth', 'tokens': 2100, 'accepted_pr': 1, 'retries': 0},\n",
        "    {'repo': 'data', 'team': 'analytics', 'tokens': 5200, 'accepted_pr': 0, 'retries': 3},\n",
        "])\n",
        "\n",
        "normalized_logs = raw_logs.copy()\n",
        "normalized_logs['task_completed'] = 1\n",
        "\n",
        "kpi_logs = (\n",
        "    normalized_logs\n",
        "    .groupby(['team', 'repo'], as_index=False)\n",
        "    .agg(tokens=('tokens', 'sum'),\n",
        "         tasks=('task_completed', 'sum'),\n",
        "         accepted_prs=('accepted_pr', 'sum'),\n",
        "         retries=('retries', 'sum'))\n",
        ")\n",
        "kpi_logs['tokens_per_task'] = (kpi_logs['tokens'] / kpi_logs['tasks'].clip(lower=1)).round(2)\n",
        "kpi_logs['retry_rate'] = (kpi_logs['retries'] / kpi_logs['tasks'].clip(lower=1)).round(2)\n",
        "kpi_logs['anomaly'] = (kpi_logs['tokens_per_task'] > 3000) | (kpi_logs['retry_rate'] > 0.25) | (kpi_logs['accepted_prs'] == 0)\n",
        "kpi_logs['status'] = kpi_logs['anomaly'].map({True: 'Flag anomaly', False: 'Mark healthy'})\n",
        "kpi_logs['next_action'] = kpi_logs['anomaly'].map({True: 'Showback + optimization backlog', False: 'Monitor'})\n",
        "kpi_logs.sort_values(['anomaly', 'tokens_per_task'], ascending=[False, False]).reset_index(drop=True)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Next Steps\n",
        "\n",
        "This notebook validated the blog's core claim: token efficiency is a more operationally meaningful AI FinOps metric than seat count alone. By normalizing telemetry, computing tokens per useful outcome, tracking retries and acceptance, and adding simple anomaly rules, you can start treating workflow design as a financial control system.\n",
        "\n",
        "Suggested next steps:\n",
        "- Define a minimum event schema for repo, team, tokens, attempts, task outcome, and PR outcome.\n",
        "- Export telemetry from your real GitHub and agent workflows into a central store.\n",
        "- Replace toy heuristics with organization-specific thresholds and quality signals.\n",
        "- Add review burden, CI reruns, rollback indicators, and retention metrics.\n",
        "- Build trend-based showback dashboards by workflow type, team, and repository.\n",
        "- Decide which metric you trust most first: tokens per merged change, retry rate, or review burden."
      ]
    }
  ]
}