The harness infrastructure layer for AI agents —
a local server that extends any agent harness with memory,
skills, drift detection, and intelligence.
AI coding tools today are wasteful and forgetful.
No continual learning across sessions.
Harness GapNo self-verification loop.
Harness GapNo intelligent orchestration.
Harness GapNo codebase awareness.
Harness GapThe model + harness equation.
A harness is every piece of code, configuration, and execution logic that wraps a model to turn it into a useful agent.
See The Anatomy of an Agent Harness by LangChain for a deeper breakdown.
The harness infrastructure layer.
It extends any agent harness with memory, skills, drift detection, and intelligence — without a single external API call.
Your AI tool already has a harness (filesystem, bash, sandbox). ensemble-mcp adds the intelligence layer that makes it learn, stay on task, and work smarter over time.
The developer never types an ensemble-mcp command. The AI agent calls its harness tools automatically in the background.
| Role | Harness infrastructure |
| Language | Python 3.11+ |
| Protocol | MCP (Model Context Protocol) |
| Harness tools | 19 · 8 categories |
| LLM calls | Zero |
| Storage | SQLite (WAL mode) |
| Embeddings | ONNX · ~5ms |
| Install | uvx ensemble-mcp |
| Size | ~90 MB |
| Tests | 573 passing |
| License | MIT |
The primitives: memory · drift · routing · indexing · skills · compression.
Continual learning. Long-horizon execution.
patterns_searchpatterns_storepatterns_prunesession_savesession_loadsession_searchSelf-verification, routing, progressive disclosure.
drift_checkmodel_recommendskills_discoverskills_suggestskills_generateContext rot prevention + codebase awareness.
context_compresscontext_prepareproject_indexproject_queryproject_dependenciesproject_snapshotTask: "Add user authentication" Changes: auth controllers, login views → score: 0.12 → verdict: "aligned"
Task: "Add user authentication" Changes: blog system, payment gateway → score: 0.78 → verdict: "significant_drift" → Agent warned before continuing
aligned
minor_drift
significant_drift
| Agent | Trivial | Simple | Standard | Complex |
|---|---|---|---|---|
| Signal · Git | cheapest | cheapest | cheapest | cheapest |
| Forge · Test | cheapest | cheapest | mid | mid |
| Lens · Review | cheapest | cheapest | mid | mid |
| Craft · Code | mid | mid | best | best |
| Scope · Plan | mid | mid | best | best |
A typo fix doesn't need the same model as a new microservice architecture.
.gitignore patternsproject_index | Build / refresh index |
project_query | Query by language, path, text |
project_dependencies | Import / dependency graph |
project_snapshot | Compact project summary |
| Index 1K files | < 5s |
| Index 10K files | < 30s |
| Incremental (10 files) | < 1s |
| Query response | < 5ms |
Every future project starts with your team's learned knowledge — no re-explaining needed.
Crash-proof pipeline execution — the harness maintains durable state across context windows.
session_save — checkpoint with optimistic versioningsession_load — load latest or specific checkpointsession_search — find past sessions by semantic similarityContext rot degrades model performance as the context window fills up — these tools fight it.
Everything runs locally. Nothing leaves your machine.
ensemble-mcp makes zero external API calls. All intelligence runs locally on your machine.
| Component | Technology | Speed |
|---|---|---|
| Embeddings | ONNX Runtime + MiniLM-L6-v2 | ~5ms per text |
| Vector search | numpy cosine similarity | <1ms per query |
| Storage | SQLite (WAL mode) | <5ms per op |
| Compression | Rule-based regex engine | <2ms per text |
| Cost component | Per user / month |
|---|---|
| Compute (MCP server) | $0.00 |
| LLM API calls | $0.00 |
| ONNX model serving | $0.00 |
| Data storage | $0.00 |
| Total COGS | $0.00 |
Five mechanisms that save money.
~15–25% savings on pattern context
Every session: Agent reads 30 pattern entries → ~8,000 tokens consumed
Every session: Agent queries top-3 patterns → ~800 tokens consumed Savings: ~90% → ~$8.10/dev/mo
~20–40% savings on codebase exploration
Agent explores:
glob("**/*.php")
grep("class.*Controller")
read file by file…
→ ~4-6K tokens per cycle
Agent queries:
project_query(
query="TodoController",
file_types=["php"]
)
→ ~700 tokens · ~$4-6/mo
Every task → Claude Opus $15/M input, $75/M output Including: ✗ Typo fixes → Opus ✗ Test runs → Opus ✗ Git commits → Opus
Complex tasks → Opus $15/M Simple tasks → Sonnet $3/M · 80% cheaper Trivial tasks → Haiku $0.25/M · 98% cheaper
What gets compressed:
What stays untouched:
context_prepare maximizes the stable prefix so LLM providers cache more tokens.
| Savings source | Mechanism | Monthly |
|---|---|---|
| Pattern memory | Top-3 vs full dump | ~$8.10 |
| Codebase indexing | Query vs exploration | ~$4–6 |
| Model routing | Right tier per task | Variable |
| Compression | Rule-based | 10–23% |
| Prompt caching | Section ordering | Variable |
| Total | $12–18+ |
| Team size | Low | High |
|---|---|---|
| 10 developers | $1,440 | $2,160 |
| 50 developers | $7,200 | $10,800 |
| 100 developers | $14,400 | $21,600 |
| 500 developers | $72,000 | $108,000 |
What happens behind the scenes.
"Set up a smart todo app with auth, CRUD, and auto-categorization."
model_recommend — Recommends "mid" tier (Sonnet) · saves costpatterns_search — Empty on first projectproject_index — Indexes the Laravel scaffold · 200msdrift_check — Score 0.12 ✓patterns_store — Saves for future use"Set up a recipe app with auth and CRUD."
patterns_search — Finds "laravel todo crud setup"The harness under the hood.
| Harness layer | Provider | Primitives |
|---|---|---|
| Execution | Claude Code / Codex / Cursor | Filesystem, Bash, Sandbox, Browser, Git |
| Intelligence | ensemble-mcp | Memory, Skills, Drift, Routing, Compression, Sessions, Indexing |
| Orchestration | Ensemble Pipeline | Captain, Scope, Craft, Forge, Lens, Signal, Trace |
| Model | Claude / GPT / Gemini | Raw intelligence (text in → text out) |
ensemble-mcp is harness-agnostic — it plugs into any MCP-compatible agent via the standard MCP protocol. Not tied to any specific execution environment.
{
"ok": true,
"data": { "...payload..." },
"error": null,
"meta": {
"duration_ms": 12,
"source": "sqlite",
"confidence": "exact"
}
}
Confidence: exact · partial · estimated
| Category | Retry? |
|---|---|
VALIDATION_* | Never |
NOT_FOUND_* | Never |
CONFLICT_* | After refresh |
TIMEOUT_* | With backoff |
IO_* | With backoff |
INTERNAL_* | If marked |
Every error has a code, retry guidance, and structured details.
9 regex patterns scan all text before storage — AWS keys, Bearer tokens, API keys, GitHub tokens, passwords — all replaced with [REDACTED].
Data is classified by source — local_state (trusted) · client_input (validated) · filesystem_scan (read-only).
Binds to 127.0.0.1, never exposed to the network.
All rendered markdown is XSS-sanitized before it ever hits the DOM.
Require an explicit confirm=true flag. No accidents.
Visualize everything at localhost:8787.
| Page | What it shows |
|---|---|
| Overview | Summary cards, drift trend chart, recent activity feed |
| Patterns | All stored patterns with match counts, search, filtering |
| Skills | Pending suggestions with confidence scores, stale detection |
| Projects | Indexed projects with language pie charts, role bar charts |
| Drift | Drift check history with scores, verdicts, flagged files |
| Sessions | Session list with lifecycle status, step-by-step detail |
| Reports | Bug Hunter scan results, health trend charts |
Auto-detected. Auto-registered. No friction.
| Priority | Detection | Registered |
|---|---|---|
| 1st | ensemble-mcp on PATH | ensemble-mcp |
| 2nd | uvx available | uvx ensemble-mcp |
| 3rd | Neither | python -m ensemble_mcp |
The AI agent will call ensemble-mcp tools automatically. No further configuration needed.
What's done. What's next. What's intentionally deferred.
| Feature | Priority |
|---|---|
| Embedding Model Upgrade (512 tokens) | Medium |
| Real-Time Live View (WebSocket) | Medium |
| Plugin System | Low |
| Advanced Indexing (tree-sitter) | Low |
| Scale | Files | Status |
|---|---|---|
| Small project | < 10K | ✓ Fully supported, optimal |
| Medium project | 10K – 100K | ✓ Supported with minor tuning |
| Large monorepo | 100K – 1M | ⚠ Needs FAISS, parallel indexing |
| Enterprise | 1M+ | ◊ Future — PostgreSQL, ANN, workers |
The current design is not wrong — it's correctly scoped. This documents the upgrade path for when scale demands change.
Five pillars. Five principles. One thesis.
Continual learning across sessions
Drift detection keeps agents on task
Right model for the right job
Fewer tokens, lower cost
Patterns → Skills → Institutional AI
{ok, data, error, meta} envelope