Architecture Overview
Technical architecture of the ensemble-mcp server — a harness infrastructure layer for AI agent pipelines.
What is an Agent Harness?
Agent = Model + Harness. A harness is every piece of code, configuration, and execution logic that wraps a model to make it useful. Without a harness, a model can only take in text and output text — it can't maintain state, execute code, or learn from past work.
ensemble-mcp is specifically the intelligence infrastructure layer of the harness — it provides memory, skills, drift detection, model routing, context management, and session persistence. The execution layer (filesystem, bash, sandbox) is provided by the host agent tool (Claude Code, Cursor, Codex, etc.).
Note
For a deeper dive on harness concepts, see The Anatomy of an Agent Harness by LangChain.
Harness Component Mapping
The following diagram shows how ensemble-mcp's subpackages map to harness primitives:
High-Level Design
ensemble-mcp is a Python MCP server that runs locally as a stdio process. It provides 19 tools across 8 categories, all backed by local ONNX embeddings and SQLite storage.
Package Layout
The project uses a src layout with the package at src/ensemble_mcp/:
src/ensemble_mcp/
├── __init__.py
├── __main__.py # CLI entry point (argparse, 6 subcommands)
├── server.py # MCP server setup, tool registration, dispatch
│
├── config/ # Configuration management
│ ├── defaults.py # Constants, thresholds, paths
│ └── settings.py # Layered config loader (TOML + env vars)
│
├── contracts/ # Response envelope and error taxonomy
│ ├── envelope.py # {ok, data, error, meta} wrapper + @tool_handler
│ └── errors.py # ErrorCode enum, ToolError, retry guidance
│
├── memory/ # Embedding and vector storage
│ ├── embeddings.py # ONNX Runtime model loading and inference
│ ├── similarity.py # Cosine similarity, top-K search
│ ├── store.py # VectorStore class (SQLite + embeddings)
│ └── schema.py # DDL: CREATE TABLE statements
│
├── security/ # Input/output safety
│ ├── redaction.py # Secret/PII redaction
│ └── trust.py # Confirmation requirement enforcement
│
├── state/ # Session and lifecycle management
│ ├── idempotency.py # Idempotency key check/store
│ ├── lifecycle.py # SessionState/StepState enums, transitions
│ └── locks.py # SQLite connection management (WAL, threading)
│
├── tools/ # MCP tool implementations (19 tools)
│ ├── patterns.py # patterns_search, patterns_store, patterns_prune
│ ├── drift.py # drift_check
│ ├── routing.py # model_recommend
│ ├── skills.py # skills_discover, skills_suggest, skills_generate
│ ├── session.py # session_save, session_load, session_search
│ ├── indexer.py # project_index, project_query, project_dependencies, project_snapshot
│ ├── compress.py # context_compress, context_prepare
│ └── mcp_tracking.py # Record MCP call history
│
├── compress/ # Context compression engine
│ ├── __init__.py # Package init, re-exports compress + CompressResult
│ ├── engine.py # Rule-based text compression pipeline
│ ├── preservers.py # Regex patterns for preserving technical content
│ └── tokens.py # Token counting via HuggingFace tokenizer
│
├── installer/ # Auto-installer for AI tools
│ ├── __init__.py # ToolDefinition, InstallPlan, result types
│ ├── setup.py # detect → plan → confirm → execute flow
│ ├── registry.py # Config file read/write (JSON, TOML)
│ └── agents.py # Agent/skill file discovery and copying
│
├── dashboard/ # Web dashboard
│ ├── __init__.py # start_dashboard() export
│ ├── server.py # aiohttp app creation, static file serving
│ ├── api.py # 25+ JSON API endpoints
│ └── static/ # SPA frontend
│ ├── index.html
│ ├── app.js
│ └── style.css
│
├── cli/ # Terminal UI
│ └── banner.py # Startup banner (Rich)
│
└── data/ # Bundled files
├── agents/ # 7 agent markdown files
└── skills/ # Workflow skill file
Subpackage Responsibilities
Each subpackage implements one or more harness primitives:
memory/ — Memory & Search (Continual Learning)
ONNX embeddings and vector storage. The foundation of the harness's memory system. Loads the MiniLM-L6-v2 model via ONNX Runtime (~5ms per embedding, 384 dimensions). VectorStore manages the SQLite database with schema migrations. Cosine similarity is computed via numpy — brute-force is sufficient for <10K vectors. Enables agents to durably store knowledge from one session and inject it into future sessions.
compress/ — Context Rot Prevention (Compaction)
Rule-based text compression engine. Addresses context rot — the degradation of model performance as the context window fills up. Removes filler words, articles, hedging phrases, and pleasantries from prose sections while preserving all technical content (code blocks, URLs, file paths, headings, tables). Zero LLM calls. The context_prepare tool orders prompt sections for optimal LLM cache hit rates.
tools/ — Harness Tool Implementations (19 tools)
19 MCP tool implementations mapping to harness primitives. Each tool is an async function decorated with @tool_handler:
| Harness Primitive | Tools | File |
|---|---|---|
| Memory & Search | patterns_search, patterns_store, patterns_prune | patterns.py |
| Drift Detection | drift_check | drift.py |
| Model Routing | model_recommend | routing.py |
| Skills (Progressive Disclosure) | skills_discover, skills_suggest, skills_generate | skills.py |
| Session Persistence | session_save, session_load, session_search | session.py |
| Codebase Awareness | project_index, project_query, project_dependencies, project_snapshot | indexer.py |
| Context Management | context_compress, context_prepare | compress.py |
The mcp_tracking.py module records every MCP call for observability (tool name, arguments, result, duration).
config/ — Harness Configuration
Layered settings resolution. Loads defaults from defaults.py, merges global config (~/.config/ensemble-mcp/config.toml), project config (.ensemble-mcp.toml), and environment variables (ENSEMBLE_MCP_*). Scalar values override; maps merge shallowly; lists replace.
contracts/ — Response Standardization
Response standardization and error taxonomy. Every tool returns {ok, data, error, meta} via the @tool_handler decorator, which handles timing, error wrapping, and the envelope format. Error codes follow a prefix-based taxonomy (VALIDATION_*, NOT_FOUND_*, CONFLICT_*, TIMEOUT_*, IO_*, INTERNAL_*) with built-in retry guidance.
security/ — Trust Boundaries
Input safety. redaction.py strips secrets and PII before storage or embedding. trust.py enforces confirmation requirements for destructive operations (e.g., reset requires confirm=true).
state/ — Lifecycle & Idempotency
Session lifecycle and idempotency. Defines state machines for sessions (pending → running → completed | failed | killed) and steps (pending → running → completed | failed | skipped). Supports the long horizon execution harness primitive by enabling durable state across context windows. Idempotency keys prevent duplicate execution of mutating tools.
installer/ — Harness Setup
Auto-detection and registration of AI tools. Defines ToolDefinition for 6 supported tools (OpenCode, Claude Code, Copilot, Cursor, Windsurf, Devin CLI) with their config paths. Makes it easy to plug ensemble-mcp into any existing agent harness.
dashboard/ — Observability
Local web dashboard. An aiohttp server serving an Alpine.js SPA. Provides visibility into harness state — patterns, skills, drift history, sessions, and codebase index. The API layer (25+ endpoints) opens its own SQLite connections to avoid blocking the MCP server.
cli/ — Terminal UI
Terminal UI. Currently contains only the startup banner, which uses Rich to display server version, config paths, and database location on stderr.
data/ — Bundled Harness Files
Bundled agent and skill files. The 7-agent orchestration pipeline (team-ensemble, team-scope, team-craft, team-forge, team-trace, team-lens, team-signal) and the ensemble-mcp-workflow skill file. These are the system prompts and AGENTS.md files that constitute the orchestration layer of the harness.
Data Flow
Tool Call Flow
- AI tool sends MCP request over stdin
server.pydeserializes the request and routes to_dispatch_tool()_dispatch_tool()matches the tool name and calls the handler- Handler function executes (embedding, DB queries, etc.)
@tool_handlerwraps the result in{ok, data, error, meta}mcp_tracking.pyrecords the call inmcp_callstable- Response is serialized to JSON and written to stdout
Embedding Flow
- Text input → tokenizer (HuggingFace
tokenizerslibrary) - Token IDs → ONNX Runtime inference (MiniLM-L6-v2)
- Output → 384-dimensional float32 vector
- Vector stored as raw bytes in SQLite BLOB column
- Search: query vector compared against stored vectors via numpy cosine similarity
Storage Layer
SQLite Database
Location: ~/.cache/ensemble-mcp/data.db (WAL mode for concurrent reads)
Core tables:
patterns— stored patterns with embeddings and match countssession_checkpoints— pipeline checkpoints with state JSON and embeddingsdrift_history— drift check results with scores and verdictsproject_files— indexed files with language, role, size, mtimefile_exports— extracted exports (functions, classes, etc.)file_imports— extracted imports per fileskill_suggestions— proposed skill suggestions from clusteringskill_suggestion_patterns— junction table linking suggestions to patternsskill_usage_tracking— usage metrics for discovered skillsskill_file_cache— cached skill file content and embeddingsmcp_calls— MCP call history for dashboardidempotency_keys— idempotency key deduplication (24h TTL)project_snapshots— cached project baseline summaries
ONNX Model
Location: ~/.cache/ensemble-mcp/models/
Files:
model.onnx— MiniLM-L6-v2 ONNX model (~22 MB)tokenizer.json— HuggingFace tokenizer config
Model: sentence-transformers/all-MiniLM-L6-v2 — 384-dimensional embeddings, downloaded from Hugging Face on first use.
Extension Points
Adding a New Tool
- Create or add to a file in
src/ensemble_mcp/tools/ - Implement an async function decorated with
@tool_handler - Add a
Tooldefinition inserver.py'sTOOL_DEFINITIONSlist - Add a case to
_dispatch_tool()inserver.py
Adding a New AI Tool (Installer)
- Add a
ToolDefinitiontoTOOL_DEFINITIONSinsrc/ensemble_mcp/installer/__init__.py - Define: config format, config path, MCP section path, detection paths, server entry format
- The installer automatically handles detection, registration, and backup
Adding Configuration Options
- Add a constant to
src/ensemble_mcp/config/defaults.py - Add a field to the
Settingsdataclass insrc/ensemble_mcp/config/settings.py - The field is automatically available via TOML config and
ENSEMBLE_MCP_*env vars
Design Principles
- Local-only harness intelligence: Zero LLM/API calls — all intelligence runs locally (ONNX embeddings, numpy similarity, rule-based compression). The harness infrastructure layer adds zero latency or cost from external services.
- Single-file storage: One SQLite database per user, WAL mode for concurrency
- Incremental indexing: mtime-based invalidation avoids re-processing unchanged files
- Standard envelope: Every tool returns
{ok, data, error, meta}with typed error codes - Idempotent mutations: Optional idempotency keys prevent duplicate execution
- Non-destructive installs: Config backups are created before any modification
- Harness-agnostic: Works with any MCP-compatible agent harness — not tied to a specific execution environment
Next Steps
- Tool Reference — complete tool documentation
- Integration Guide — using tools in pipelines
- Configuration — all configurable options