Building a Local LLM Research Agent: Privacy-First AI with SQL, RAG, and MCP

Building a Local LLM Research Agent: Privacy-First AI with SQL, RAG, and MCP

What if you could query databases, search documents, and run multi-tool AI workflows — all without a single byte leaving your machine? That's exactly what I built with the Local LLM Universal Research Agent.

The Problem

Most AI agent frameworks assume cloud APIs. That's fine for many use cases, but when you're working with sensitive data — internal databases, proprietary documents, client information — sending that data to OpenAI or Anthropic isn't always an option. I needed an agent that could:

  • Query SQL databases using natural language
  • Search document collections via RAG
  • Use extensible tools via the Model Context Protocol (MCP)
  • Run entirely locally with zero cloud dependencies

Architecture

The agent is built on a modern Python stack with a React frontend:

LayerTechnologyPurpose
FrontendReact 19 + Vite + TypeScriptChat UI with model switching, tool status, streaming
BackendFastAPI + SQLAlchemy + AlembicREST API with WebSocket streaming
Agent FrameworkPydantic AIStructured agent with tool calling
LLM RuntimeOllama or Foundry LocalLocal model inference
Vector StoreSQL Server 2025 (native VECTOR type)Embeddings for RAG
Tool IntegrationModel Context Protocol (MCP)Extensible tool architecture

Key Features

Natural Language SQL

Ask questions about your data in plain English. The agent translates your intent into SQL, executes it against your database, and returns human-readable results:

User: "What were the top 5 products by revenue last quarter?"

Agent: [Calls SQL tool] → SELECT TOP 5 ProductName, SUM(Revenue) as TotalRevenue
       FROM Sales WHERE OrderDate >= '2025-10-01'
       GROUP BY ProductName ORDER BY TotalRevenue DESC

→ Returns formatted table with results

Dual LLM Provider Support

Switch between Ollama and Microsoft Foundry Local on the fly. The model switcher in the UI lets you pick the best model for the task — use a large model for complex reasoning, switch to a fast model for quick lookups:

# Provider abstraction — swap with zero code changes
from src.providers import get_provider

provider = get_provider("ollama")  # or "foundry"
response = await provider.generate(
    model="qwen2.5:32b",
    prompt=user_message,
    tools=available_tools,
    stream=True
)

MCP Tool Integration

The Model Context Protocol gives the agent access to extensible tools. Out of the box, it connects to a MSSQL MCP server for database operations, but the architecture supports any MCP-compatible tool — file systems, APIs, web search, custom business logic:

{
  "mcpServers": {
    "mssql": {
      "command": "npx",
      "args": ["-y", "@anthropic/mcp-server-mssql"],
      "env": {
        "MSSQL_CONNECTION_STRING": "Server=localhost;Database=research;..."
      }
    }
  }
}

Hybrid Search with SQL Server 2025

The agent uses SQL Server 2025's native VECTOR type for embeddings, combined with full-text search for keyword matching. A Reciprocal Rank Fusion (RRF) procedure merges both result sets for superior retrieval quality:

-- Hybrid search: semantic + keyword with RRF fusion
CREATE PROCEDURE sp_HybridSearch
    @query NVARCHAR(MAX),
    @query_vector VECTOR(1536),
    @top_k INT = 10
AS
BEGIN
    -- Semantic results via cosine similarity
    SELECT id, VECTOR_DISTANCE('cosine', embedding, @query_vector) as sem_score
    INTO #semantic FROM documents ORDER BY sem_score;

    -- Keyword results via full-text search
    SELECT id, RANK as ft_score
    INTO #fulltext FROM CONTAINSTABLE(documents, content, @query);

    -- RRF fusion
    SELECT id, (1.0/(60+sem_rank) + 1.0/(60+ft_rank)) as rrf_score
    FROM #semantic s JOIN #fulltext f ON s.id = f.id
    ORDER BY rrf_score DESC;
END

Getting Started

The repo is fully containerized with Docker Compose. One command brings up everything:

# Clone and start
git clone https://github.com/fgarofalo56/local-llm-research-agent.git
cd local-llm-research-agent

# Start all services (SQL Server, backend, frontend)
docker compose up -d

# Install a model via Ollama
ollama pull qwen2.5:32b

Why It Matters

Privacy-first AI isn't about being anti-cloud — it's about having options. When you can run sophisticated AI workflows locally, you can:

  • Process sensitive data without compliance concerns
  • Develop and test AI features without API costs
  • Work offline or in air-gapped environments
  • Understand exactly what your AI is doing under the hood

The full source is on GitHub: fgarofalo56/local-llm-research-agent. Star it, fork it, break it, build on it.

Link copied