Building a Local LLM Research Agent: Privacy-First AI with SQL, RAG, and MCP
What if you could query databases, search documents, and run multi-tool AI workflows — all without a single byte leaving your machine? That's exactly what I built with the Local LLM Universal Research Agent.
The Problem
Most AI agent frameworks assume cloud APIs. That's fine for many use cases, but when you're working with sensitive data — internal databases, proprietary documents, client information — sending that data to OpenAI or Anthropic isn't always an option. I needed an agent that could:
- Query SQL databases using natural language
- Search document collections via RAG
- Use extensible tools via the Model Context Protocol (MCP)
- Run entirely locally with zero cloud dependencies
Architecture
The agent is built on a modern Python stack with a React frontend:
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | React 19 + Vite + TypeScript | Chat UI with model switching, tool status, streaming |
| Backend | FastAPI + SQLAlchemy + Alembic | REST API with WebSocket streaming |
| Agent Framework | Pydantic AI | Structured agent with tool calling |
| LLM Runtime | Ollama or Foundry Local | Local model inference |
| Vector Store | SQL Server 2025 (native VECTOR type) | Embeddings for RAG |
| Tool Integration | Model Context Protocol (MCP) | Extensible tool architecture |
Key Features
Natural Language SQL
Ask questions about your data in plain English. The agent translates your intent into SQL, executes it against your database, and returns human-readable results:
User: "What were the top 5 products by revenue last quarter?"
Agent: [Calls SQL tool] → SELECT TOP 5 ProductName, SUM(Revenue) as TotalRevenue
FROM Sales WHERE OrderDate >= '2025-10-01'
GROUP BY ProductName ORDER BY TotalRevenue DESC
→ Returns formatted table with results
Dual LLM Provider Support
Switch between Ollama and Microsoft Foundry Local on the fly. The model switcher in the UI lets you pick the best model for the task — use a large model for complex reasoning, switch to a fast model for quick lookups:
# Provider abstraction — swap with zero code changes
from src.providers import get_provider
provider = get_provider("ollama") # or "foundry"
response = await provider.generate(
model="qwen2.5:32b",
prompt=user_message,
tools=available_tools,
stream=True
)
MCP Tool Integration
The Model Context Protocol gives the agent access to extensible tools. Out of the box, it connects to a MSSQL MCP server for database operations, but the architecture supports any MCP-compatible tool — file systems, APIs, web search, custom business logic:
{
"mcpServers": {
"mssql": {
"command": "npx",
"args": ["-y", "@anthropic/mcp-server-mssql"],
"env": {
"MSSQL_CONNECTION_STRING": "Server=localhost;Database=research;..."
}
}
}
}
Hybrid Search with SQL Server 2025
The agent uses SQL Server 2025's native VECTOR type for embeddings, combined with full-text search for keyword matching. A Reciprocal Rank Fusion (RRF) procedure merges both result sets for superior retrieval quality:
-- Hybrid search: semantic + keyword with RRF fusion
CREATE PROCEDURE sp_HybridSearch
@query NVARCHAR(MAX),
@query_vector VECTOR(1536),
@top_k INT = 10
AS
BEGIN
-- Semantic results via cosine similarity
SELECT id, VECTOR_DISTANCE('cosine', embedding, @query_vector) as sem_score
INTO #semantic FROM documents ORDER BY sem_score;
-- Keyword results via full-text search
SELECT id, RANK as ft_score
INTO #fulltext FROM CONTAINSTABLE(documents, content, @query);
-- RRF fusion
SELECT id, (1.0/(60+sem_rank) + 1.0/(60+ft_rank)) as rrf_score
FROM #semantic s JOIN #fulltext f ON s.id = f.id
ORDER BY rrf_score DESC;
END
Getting Started
The repo is fully containerized with Docker Compose. One command brings up everything:
# Clone and start
git clone https://github.com/fgarofalo56/local-llm-research-agent.git
cd local-llm-research-agent
# Start all services (SQL Server, backend, frontend)
docker compose up -d
# Install a model via Ollama
ollama pull qwen2.5:32b
Why It Matters
Privacy-first AI isn't about being anti-cloud — it's about having options. When you can run sophisticated AI workflows locally, you can:
- Process sensitive data without compliance concerns
- Develop and test AI features without API costs
- Work offline or in air-gapped environments
- Understand exactly what your AI is doing under the hood
The full source is on GitHub: fgarofalo56/local-llm-research-agent. Star it, fork it, break it, build on it.