AI/ML

Building a Local LLM Research Agent: Privacy-First AI with SQL, RAG, and MCP

Frank Garofalo

22 Apr 2026 — 3 min read

What if you could query databases, search documents, and run multi-tool AI workflows — all without a single byte leaving your machine? That's exactly what I built with the Local LLM Universal Research Agent.

The Problem

Most AI agent frameworks assume cloud APIs. That's fine for many use cases, but when you're working with sensitive data — internal databases, proprietary documents, client information — sending that data to OpenAI or Anthropic isn't always an option. I needed an agent that could:

Query SQL databases using natural language
Search document collections via RAG
Use extensible tools via the Model Context Protocol (MCP)
Run entirely locally with zero cloud dependencies

Architecture

The agent is built on a modern Python stack with a React frontend:

Layer	Technology	Purpose
Frontend	React 19 + Vite + TypeScript	Chat UI with model switching, tool status, streaming
Backend	FastAPI + SQLAlchemy + Alembic	REST API with WebSocket streaming
Agent Framework	Pydantic AI	Structured agent with tool calling
LLM Runtime	Ollama or Foundry Local	Local model inference
Vector Store	SQL Server 2025 (native VECTOR type)	Embeddings for RAG
Tool Integration	Model Context Protocol (MCP)	Extensible tool architecture

Key Features

Natural Language SQL

Ask questions about your data in plain English. The agent translates your intent into SQL, executes it against your database, and returns human-readable results:

User: "What were the top 5 products by revenue last quarter?"

Agent: [Calls SQL tool] → SELECT TOP 5 ProductName, SUM(Revenue) as TotalRevenue
       FROM Sales WHERE OrderDate >= '2025-10-01'
       GROUP BY ProductName ORDER BY TotalRevenue DESC

→ Returns formatted table with results

Dual LLM Provider Support

Switch between Ollama and Microsoft Foundry Local on the fly. The model switcher in the UI lets you pick the best model for the task — use a large model for complex reasoning, switch to a fast model for quick lookups:

# Provider abstraction — swap with zero code changes
from src.providers import get_provider

provider = get_provider("ollama")  # or "foundry"
response = await provider.generate(
    model="qwen2.5:32b",
    prompt=user_message,
    tools=available_tools,
    stream=True
)

MCP Tool Integration

The Model Context Protocol gives the agent access to extensible tools. Out of the box, it connects to a MSSQL MCP server for database operations, but the architecture supports any MCP-compatible tool — file systems, APIs, web search, custom business logic:

{
  "mcpServers": {
    "mssql": {
      "command": "npx",
      "args": ["-y", "@anthropic/mcp-server-mssql"],
      "env": {
        "MSSQL_CONNECTION_STRING": "Server=localhost;Database=research;..."
      }
    }
  }
}

Hybrid Search with SQL Server 2025

The agent uses SQL Server 2025's native VECTOR type for embeddings, combined with full-text search for keyword matching. A Reciprocal Rank Fusion (RRF) procedure merges both result sets for superior retrieval quality:

-- Hybrid search: semantic + keyword with RRF fusion
CREATE PROCEDURE sp_HybridSearch
    @query NVARCHAR(MAX),
    @query_vector VECTOR(1536),
    @top_k INT = 10
AS
BEGIN
    -- Semantic results via cosine similarity
    SELECT id, VECTOR_DISTANCE('cosine', embedding, @query_vector) as sem_score
    INTO #semantic FROM documents ORDER BY sem_score;

    -- Keyword results via full-text search
    SELECT id, RANK as ft_score
    INTO #fulltext FROM CONTAINSTABLE(documents, content, @query);

    -- RRF fusion
    SELECT id, (1.0/(60+sem_rank) + 1.0/(60+ft_rank)) as rrf_score
    FROM #semantic s JOIN #fulltext f ON s.id = f.id
    ORDER BY rrf_score DESC;
END

Getting Started

The repo is fully containerized with Docker Compose. One command brings up everything:

# Clone and start
git clone https://github.com/fgarofalo56/local-llm-research-agent.git
cd local-llm-research-agent

# Start all services (SQL Server, backend, frontend)
docker compose up -d

# Install a model via Ollama
ollama pull qwen2.5:32b

Why It Matters

Privacy-first AI isn't about being anti-cloud — it's about having options. When you can run sophisticated AI workflows locally, you can:

Process sensitive data without compliance concerns
Develop and test AI features without API costs
Work offline or in air-gapped environments
Understand exactly what your AI is doing under the hood

The full source is on GitHub: fgarofalo56/local-llm-research-agent. Star it, fork it, break it, build on it.

Building a Local LLM Research Agent: Privacy-First AI with SQL, RAG, and MCP

Frank Garofalo

The Problem

Architecture

Key Features

Natural Language SQL

Dual LLM Provider Support

MCP Tool Integration

Hybrid Search with SQL Server 2025

Getting Started

Why It Matters

Read more

Token efficiency as the new FinOps metric for GitHub and Copilot agent workflows

Cosmos DB Shell and the future of AI-assisted database operations

Why AI-era operating models require better data product ownership and lineage

Three full-stack data platforms in a weekend: what Fabric and Azure AI Foundry enable for rapid delivery