Ensemble mcp
Docs GitHub
Harness Infrastructure v0.1.0b5 MIT License 2026

Ensemble mcp

The harness infrastructure layer for AI agents —
a local server that extends any agent harness with memory, skills, drift detection, and intelligence.

19
Harness Tools
0
External API Calls
573
Tests Passing
~5ms
Embedding Speed
Scroll
§ 01 Contents
What we'll cover

Twelve chapters, one story.

From the harness concept to live installation.

  1. 01The Problem — why AI coding tools are wasteful todayContext
  2. 02What is an Agent Harness? — the model + harness equationFraming
  3. 03What is Ensemble mcp? — the harness infrastructure layerThesis
  4. 0419 Harness Tools — memory, drift, routing, indexing, skills, compressionSurface
  5. 05Zero-LLM Architecture — how it's efficientUnder
  6. 06Token & Cost Reduction — five mechanisms that save moneyEconomics
  7. 07Live Workflow — what happens behind the scenesBehavior
  8. 08Technical Architecture — under the hoodSystem
  9. 09Dashboard — observability & visualizationUI
  10. 10Installation — one command to get startedOnboarding
  11. 11Future Roadmap — what's nextHorizon
  12. 12Key Takeaways — the harness pillarsClose
Chapter One

The problem.

AI coding tools today are wasteful and forgetful.

§ 01.01 The Groundhog Day loop
Developer experience today

Every session starts from zero.

The Developer Experience

Session 1: "Use service classes" ✓ Works!
Session 2: "Use service classes" ✗ Forgot
Session 3: "Use service classes" ✗ Again
Session 4: "Use service classes" ✗ …
A team of 10 engineers running 10 pipelines / day wastes an estimated 16.2M tokens per month on redundant context alone.

The four core problems

No Memory

No continual learning across sessions.

Harness Gap
Silent Drift

No self-verification loop.

Harness Gap
Static Routing

No intelligent orchestration.

Harness Gap
Redundant Exploration

No codebase awareness.

Harness Gap
Chapter Two

What is an
agent harness?

The model + harness equation.

§ 02.01 The equation
Agent = Model + Harness

A harness is everything around the model.

A harness is every piece of code, configuration, and execution logic that wraps a model to turn it into a useful agent.

  • System prompts — shape agent behavior
  • Tools, skills & MCPs — capabilities
  • Execution environment — filesystem, bash, sandbox
  • Orchestration — subagent spawning, routing
  • Memory & context management — compaction, persistence
  • Hooks & middleware — linting, drift checks

See The Anatomy of an Agent Harness by LangChain for a deeper breakdown.

Agent
Harness
  • System Prompts
  • Tools, Skills & MCPs
  • Execution Environment
  • Orchestration Logic
  • Memory & Context Mgmt
  • Hooks & Middleware
MODEL
§ 02.02 Harness layers
Layered architecture

The harness stack.

Agent harnesses have layers. Different providers handle different layers.

Execution Layer
Claude Code / Codex / Cursor · Filesystem, Bash, Sandbox, Browser, Git
+
Intelligence Infrastructure Us
ensemble-mcp · Memory, Skills, Drift, Routing, Compression, Sessions, Codebase Indexing
+
Orchestration
Ensemble 7-agent pipeline · Captain · Scope · Craft · Forge · Lens · Signal
MODEL
Chapter Three

What is
Ensemble mcp?

The harness infrastructure layer.

§ 03.01 The intelligence layer
The definition

A harness infrastructure layer delivered as a local Python MCP server.

It extends any agent harness with memory, skills, drift detection, and intelligence — without a single external API call.

Adds the intelligence layer

Your AI tool already has a harness (filesystem, bash, sandbox). ensemble-mcp adds the intelligence layer that makes it learn, stay on task, and work smarter over time.

Invisible by design

The developer never types an ensemble-mcp command. The AI agent calls its harness tools automatically in the background.

Key facts
RoleHarness infrastructure
LanguagePython 3.11+
ProtocolMCP (Model Context Protocol)
Harness tools19 · 8 categories
LLM callsZero
StorageSQLite (WAL mode)
EmbeddingsONNX · ~5ms
Installuvx ensemble-mcp
Size~90 MB
Tests573 passing
LicenseMIT
Chapter Four

19 harness tools
in action.

The primitives: memory · drift · routing · indexing · skills · compression.

§ 04.01 Primitives map
Harness primitives at a glance

Nineteen primitives, eight categories, one envelope.

Group I

Memory & Session

Continual learning. Long-horizon execution.

Memory & Search
  • patterns_search
  • patterns_store
  • patterns_prune
Session Persistence
  • session_save
  • session_load
  • session_search
Group II

Intelligence

Self-verification, routing, progressive disclosure.

Self-Verification
  • drift_check
Model Routing
  • model_recommend
Skills (Progressive Disclosure)
  • skills_discover
  • skills_suggest
  • skills_generate
Group III

Context & Codebase

Context rot prevention + codebase awareness.

Context Rot Prevention
  • context_compress
  • context_prepare
Codebase Awareness
  • project_index
  • project_query
  • project_dependencies
  • project_snapshot
§ 04.02 Memory
Continual learning

Pattern memory.

Agents remember what worked before — the harness enables continual learning.

  • 384-dim vectors — text embedded via ONNX MiniLM-L6-v2
  • SQLite-backed — stored alongside rich metadata
  • Semantic search via cosine similarity — not keyword matching
  • ~5ms per embedding · <1ms per search
graph LR S1["Session 1:
patterns_store('laravel auth setup', ...)"] --> DB[(SQLite Vector Store)] DB --> S5["Session 5:
patterns_search('authentication')"] S5 --> R["Returns Session 1's
approach instantly"] classDef db fill:#f59e0b,stroke:#f59e0b,color:#fff,rx:4 classDef ret fill:#10b981,stroke:#10b981,color:#fff,rx:4 class DB db class R ret
§ 04.03 Self-verification
Self-verification loop

Drift detection.

A harness primitive that catches agents going off-task before damage is done.

Aligned
Task:    "Add user authentication"
Changes: auth controllers, login views

→ score:   0.12
→ verdict: "aligned"
Significant Drift
Task:    "Add user authentication"
Changes: blog system, payment gateway

→ score:   0.78
→ verdict: "significant_drift"
→ Agent warned before continuing

Verdict scale

Score < 0.25 aligned
Proceed normally
0.25 – 0.59 minor_drift
Log a warning
Score ≥ 0.60 significant_drift
Intervention required
§ 04.04 Routing
Right model for the right job

Smart model routing.

Stop paying premium prices for simple tasks.

AgentTrivialSimpleStandardComplex
Signal · Gitcheapestcheapestcheapestcheapest
Forge · Testcheapestcheapestmidmid
Lens · Reviewcheapestcheapestmidmid
Craft · Codemidmidbestbest
Scope · Planmidmidbestbest

A typo fix doesn't need the same model as a new microservice architecture.

§ 04.05 Indexing
Index once, query instantly

Codebase indexing.

Stop re-exploring the same files on every single run.

  • 30+ languages detected by extension
  • 12 role categories — test, migration, config, model, controller, service…
  • Exported symbols with signatures and docstrings
  • Import/dependency graph across the project
  • Respects .gitignore patterns
  • Incremental updates via file mtime

Tools

project_indexBuild / refresh index
project_queryQuery by language, path, text
project_dependenciesImport / dependency graph
project_snapshotCompact project summary

Performance

Index 1K files< 5s
Index 10K files< 30s
Incremental (10 files)< 1s
Query response< 5ms
§ 04.06 Progressive disclosure
Patterns graduate into skills

Skill intelligence — progressive disclosure.

A harness primitive that prevents context rot by loading only what's needed.

graph LR P["patterns_store
after each pipeline"] --> A["Accumulate
20+ patterns"] A --> C["Cluster detected
cosine sim ≥ 0.75"] C --> S["skills_suggest
confidence: 0.87"] S --> U{User decision} U -->|Accept| G["skills_generate
.ai/skills/skill.md"] U -->|Dismiss| D["Permanently
suppressed"] U -->|Defer| P G --> L["skills_discover
auto-loads in future
sessions"] classDef gen fill:#10b981,stroke:#10b981,color:#fff,rx:4 classDef sug fill:#3b82f6,stroke:#3b82f6,color:#fff,rx:4 class G gen class S sug
Result

Every future project starts with your team's learned knowledge — no re-explaining needed.

§ 04.07 Long-horizon execution
Session persistence & context rot prevention

Resume, and say less.

Long-horizon execution

Crash-proof pipeline execution — the harness maintains durable state across context windows.

  • session_save — checkpoint with optimistic versioning
  • session_load — load latest or specific checkpoint
  • session_search — find past sessions by semantic similarity

Context rot prevention

Context rot degrades model performance as the context window fills up — these tools fight it.

Input
"I'd be happy to help! So basically, in order to make use of the API, you'll definitely need to take into consideration the auth requirements."
Output
"To use API, consider auth requirements."
→ 23 tokens → 7 tokens · 70% reduction
Preserved: code blocks URLs paths tables
Compressed: filler pleasantries hedging verbose phrases
Chapter Five

Zero-LLM
architecture.

Everything runs locally. Nothing leaves your machine.

§ 05.01 Local-only
Zero external API calls

Everything runs locally.

ensemble-mcp makes zero external API calls. All intelligence runs locally on your machine.

ComponentTechnologySpeed
EmbeddingsONNX Runtime + MiniLM-L6-v2~5ms per text
Vector searchnumpy cosine similarity<1ms per query
StorageSQLite (WAL mode)<5ms per op
CompressionRule-based regex engine<2ms per text

Traditional approach

  • Every operation → API call
  • Latency: 500ms – 2000ms
  • Cost: $0.001 – $0.01 each
  • Requires internet
  • Data leaves your machine

ensemble-mcp approach

  • Every operation → LOCAL
  • Latency: 1ms – 10ms
  • Cost: $0.00
  • Works offline
  • Data stays local
§ 05.02 Economics
Near-zero marginal cost

The economics of running nothing.

Cost componentPer user / month
Compute (MCP server)$0.00
LLM API calls$0.00
ONNX model serving$0.00
Data storage$0.00
Total COGS $0.00
~97%
gross margin on any paid tier — achievable from day one
Chapter Six

Token & cost
reduction.

Five mechanisms that save money.

§ 06.01 Mechanisms 1 & 2
Pattern memory · codebase indexing

Pattern memory & indexing.

Mechanism 1 · Pattern memory

~15–25% savings on pattern context

Without ensemble-mcp

Every session:
  Agent reads 30 pattern entries
  → ~8,000 tokens consumed

With ensemble-mcp

Every session:
  Agent queries top-3 patterns
  → ~800 tokens consumed

Savings: ~90% → ~$8.10/dev/mo

Mechanism 2 · Codebase indexing

~20–40% savings on codebase exploration

Without ensemble-mcp

Agent explores:
  glob("**/*.php")
  grep("class.*Controller")
  read file by file…

→ ~4-6K tokens per cycle

With ensemble-mcp

Agent queries:
  project_query(
    query="TodoController",
    file_types=["php"]
  )

→ ~700 tokens · ~$4-6/mo
§ 06.02 Mechanism 3
Smart model routing

Tier the task,
save the budget.

~30–60% cost savings by using the right tier for each task.

Without ensemble-mcp

Every task → Claude Opus
  $15/M input, $75/M output

Including:
  ✗ Typo fixes   → Opus
  ✗ Test runs    → Opus
  ✗ Git commits  → Opus

With ensemble-mcp

Complex tasks → Opus   $15/M
Simple tasks  → Sonnet $3/M   · 80% cheaper
Trivial tasks → Haiku  $0.25/M · 98% cheaper
§ 06.03 Mechanisms 4 & 5
Compression + caching

Compression & caching.

Context compression

What gets compressed:

  • • Filler — "just", "really", "basically"
  • • Pleasantries — "I'd be happy to help!"
  • • Verbose — "in order to" → "to"
  • • Hedging — "I think", "it seems"

What stays untouched:

  • • Code blocks, URLs, file paths
  • • Headings, tables, technical content
Result: ~10–23% fewer tokens.

Prompt cache optimization

Static
System prompt, rules — always same
Cached
Project
Conventions, structure — rarely changes
Often Cached
Task
Current request, diff — changes each time
Not Cached

context_prepare maximizes the stable prefix so LLM providers cache more tokens.

§ 06.04 The bottom line
Total cost savings

The cumulative effect.

Per developer · monthly

Savings sourceMechanismMonthly
Pattern memoryTop-3 vs full dump~$8.10
Codebase indexingQuery vs exploration~$4–6
Model routingRight tier per taskVariable
CompressionRule-based10–23%
Prompt cachingSection orderingVariable
Total $12–18+

At scale · annually

Team sizeLowHigh
10 developers$1,440$2,160
50 developers$7,200$10,800
100 developers$14,400$21,600
500 developers$72,000$108,000
Chapter Seven

Live
workflow.

What happens behind the scenes.

§ 07.01 Then & now
Two sessions, side by side

The flywheel in motion.

Session 1 New project

"Set up a smart todo app with auth, CRUD, and auto-categorization."

  1. model_recommend — Recommends "mid" tier (Sonnet) · saves cost
  2. patterns_search — Empty on first project
  3. project_index — Indexes the Laravel scaffold · 200ms
  4. Agent writes code — Uses indexed structure
  5. drift_check — Score 0.12 ✓
  6. patterns_store — Saves for future use

Session 5 New project, similar task

"Set up a recipe app with auth and CRUD."

  1. patterns_searchFinds "laravel todo crud setup"
  2. Agent applies — Already knows: Breeze, service classes
  3. Faster output — No re-explaining needed

The flywheel effect

More sessions → More patterns → Better matching → Faster dev → More patterns → ...
After 20+ patterns: Skills auto-detectedPermanent project skills created → Every future project starts with learned knowledge.
Chapter Eight

Technical
architecture.

The harness under the hood.

§ 08.01 Layer architecture
Harness layer architecture

Four layers. One contract.

Harness layerProviderPrimitives
Execution Claude Code / Codex / Cursor Filesystem, Bash, Sandbox, Browser, Git
Intelligence ensemble-mcp Memory, Skills, Drift, Routing, Compression, Sessions, Indexing
Orchestration Ensemble Pipeline Captain, Scope, Craft, Forge, Lens, Signal, Trace
Model Claude / GPT / Gemini Raw intelligence (text in → text out)

ensemble-mcp is harness-agnostic — it plugs into any MCP-compatible agent via the standard MCP protocol. Not tied to any specific execution environment.

§ 08.02 System overview
Clients · Server · Storage

Six harnesses. One intelligence layer.

graph TB subgraph "Agent Harnesses (Execution Layer)" C1[OpenCode] C2[Claude Code] C3[Copilot] C4[Cursor] C5[Windsurf] C6[Devin] end subgraph "ensemble-mcp (Intelligence Infrastructure)" SRV[Server — Tool Dispatch] SRV --> PAT["patterns.py (3)"] SRV --> DFT["drift.py (1)"] SRV --> RTG["routing.py (1)"] SRV --> SKL["skills.py (3)"] SRV --> SES["session.py (3)"] SRV --> IDX["indexer.py (4)"] SRV --> CMP["compress.py (2)"] SRV --> UTL["health + reset (2)"] end subgraph "Local Storage" DB[(SQLite WAL)] MDL[ONNX Model 22MB] end C1 --> SRV C2 --> SRV C3 --> SRV C4 --> SRV C5 --> SRV C6 --> SRV PAT --> DB DFT --> DB SKL --> DB SES --> DB IDX --> DB PAT --> MDL DFT --> MDL SKL --> MDL classDef srv fill:#10b981,stroke:#10b981,color:#fff,rx:4 classDef db fill:#f59e0b,stroke:#f59e0b,color:#fff,rx:4 classDef mdl fill:#8b5cf6,stroke:#8b5cf6,color:#fff,rx:4 class SRV srv class DB db class MDL mdl
§ 08.03 Contracts
Response envelope & error taxonomy

One envelope. Every tool.

Every tool returns this

{
  "ok": true,
  "data": { "...payload..." },
  "error": null,
  "meta": {
    "duration_ms": 12,
    "source": "sqlite",
    "confidence": "exact"
  }
}

Confidence: exact · partial · estimated

Structured error codes

CategoryRetry?
VALIDATION_*Never
NOT_FOUND_*Never
CONFLICT_*After refresh
TIMEOUT_*With backoff
IO_*With backoff
INTERNAL_*If marked

Every error has a code, retry guidance, and structured details.

§ 08.04 Security
Local-first by design

Security.

Secrets never leave the machine. Boundaries are explicit.

  • Secret redaction

    9 regex patterns scan all text before storage — AWS keys, Bearer tokens, API keys, GitHub tokens, passwords — all replaced with [REDACTED].

  • Trust boundaries

    Data is classified by source — local_state (trusted) · client_input (validated) · filesystem_scan (read-only).

  • Local-only dashboard

    Binds to 127.0.0.1, never exposed to the network.

  • DOMPurify sanitization

    All rendered markdown is XSS-sanitized before it ever hits the DOM.

  • Destructive operations

    Require an explicit confirm=true flag. No accidents.

Chapter Nine

Dashboard &
observability.

Visualize everything at localhost:8787.

§ 09.01 Web dashboard
Web dashboard

A window into the living memory.

$ ensemble-mcp web # Opens browser to localhost:8787
$ ensemble-mcp web --port 9000 # Custom port
PageWhat it shows
OverviewSummary cards, drift trend chart, recent activity feed
PatternsAll stored patterns with match counts, search, filtering
SkillsPending suggestions with confidence scores, stale detection
ProjectsIndexed projects with language pie charts, role bar charts
DriftDrift check history with scores, verdicts, flagged files
SessionsSession list with lifecycle status, step-by-step detail
ReportsBug Hunter scan results, health trend charts
Alpine.js Chart.js Zero build step Kinetic Architect design system 11+ JSON API endpoints Same SQLite DB · WAL no contention
Chapter Ten

One command
to get started.

Auto-detected. Auto-registered. No friction.

§ 10.01 Getting started
Getting started

Install · register · launch.

Install & register

# Install the package
$ pip install ensemble-mcp
# Or run directly (no install needed)
$ uvx ensemble-mcp
# Auto-detect AI tools and register
$ ensemble-mcp install
# Launch the dashboard
$ ensemble-mcp web

Smart command detection

PriorityDetectionRegistered
1stensemble-mcp on PATHensemble-mcp
2nduvx availableuvx ensemble-mcp
3rdNeitherpython -m ensemble_mcp

Supported AI tools

  • OpenCodeAuto-install
  • Claude CodeAuto-install
  • GitHub Copilot (VS Code)Auto-install
  • CursorAuto-install
  • WindsurfAuto-install
  • Devin CLIAuto-install
That's it.

The AI agent will call ensemble-mcp tools automatically. No further configuration needed.

Chapter Eleven

Future
roadmap.

What's done. What's next. What's intentionally deferred.

§ 11.01 Roadmap
What's done, what's next

Roadmap.

Completed

  • 19 MCP tools (memory, drift, routing, skills, sessions, indexer, compress)
  • Web Dashboard with Kinetic Architect redesign
  • Skill Intelligence (pattern-to-skill auto-graduation)
  • Auto-installer for 6 AI tools
  • Context compression + prompt caching
  • Bug Hunter reports dashboard
  • Dashboard v2 (management UI)
  • 573 tests passing

Coming next

FeaturePriority
Embedding Model Upgrade (512 tokens)Medium
Real-Time Live View (WebSocket)Medium
Plugin SystemLow
Advanced Indexing (tree-sitter)Low
§ 11.02 Scaling
Scaling path

Right-sized for today.

With a clear map of where it grows next.

ScaleFilesStatus
Small project< 10K✓ Fully supported, optimal
Medium project10K – 100K✓ Supported with minor tuning
Large monorepo100K – 1M⚠ Needs FAISS, parallel indexing
Enterprise1M+◊ Future — PostgreSQL, ANN, workers

Intentionally deferred

  • — No FAISS/Qdrant in v1 — numpy is perfect for < 10K vectors
  • — No PostgreSQL — SQLite is right for local storage
  • — No distributed architecture — local model is correct
  • — No premature abstraction — interfaces come with 2nd backend

The current design is not wrong — it's correctly scoped. This documents the upgrade path for when scale demands change.

Chapter Twelve

Key
takeaways.

Five pillars. Five principles. One thesis.

§ 12.01 The harness pillars
The harness pillars

Memory · Self-Verification · Orchestration · Context Rot · Progressive Disclosure.

01

Memory

Continual learning across sessions

02

Self-Verification

Drift detection keeps agents on task

03

Orchestration

Right model for the right job

04

Context Rot

Fewer tokens, lower cost

05

Progressive Disclosure

Patterns → Skills → Institutional AI

Harness design principles

  1. Zero-LLM-Call — The harness infrastructure never calls external APIs
  2. Local-First — All data stays on the developer's machine
  3. Harness-Agnostic — Works with any MCP-compatible agent harness
  4. Progressive Disclosure — Load only task-relevant skills and context
  5. Contract-First — All tools use {ok, data, error, meta} envelope
Ready when you are

Get started.

pip install ensemble-mcp && ensemble-mcp install && ensemble-mcp web
GitHub Docs MIT License
Python 3.11+ · ONNX Runtime · SQLite · 573 tests · 19 harness tools · Zero external API calls