Harness Infrastructure v0.1.0b5 MIT License 2026

Ensemble mcp

The harness infrastructure layer for AI agents —
a local server that extends any agent harness with memory, skills, drift detection, and intelligence.

Begin presentation Browse documentation pip install ensemble-mcp

19

Harness Tools

0

External API Calls

573

Tests Passing

~5ms

Embedding Speed

Scroll

§ 01 Contents

What we'll cover

Twelve chapters, one story.

From the harness concept to live installation.

01The Problem — why AI coding tools are wasteful todayContext
02What is an Agent Harness? — the model + harness equationFraming
03What is Ensemble mcp? — the harness infrastructure layerThesis
0419 Harness Tools — memory, drift, routing, indexing, skills, compressionSurface
05Zero-LLM Architecture — how it's efficientUnder
06Token & Cost Reduction — five mechanisms that save moneyEconomics
07Live Workflow — what happens behind the scenesBehavior
08Technical Architecture — under the hoodSystem
09Dashboard — observability & visualizationUI
10Installation — one command to get startedOnboarding
11Future Roadmap — what's nextHorizon
12Key Takeaways — the harness pillarsClose

Chapter One

The problem.

AI coding tools today are wasteful and forgetful.

§ 01.01 The Groundhog Day loop

Developer experience today

Every session starts from zero.

The Developer Experience

Session 1: "Use service classes" ✓ Works!

Session 2: "Use service classes" ✗ Forgot

Session 3: "Use service classes" ✗ Again

Session 4: "Use service classes" ✗ …

A team of 10 engineers running 10 pipelines / day wastes an estimated 16.2M tokens per month on redundant context alone.

The four core problems

No Memory

No continual learning across sessions.

Harness Gap

Silent Drift

No self-verification loop.

Harness Gap

Static Routing

No intelligent orchestration.

Harness Gap

Redundant Exploration

No codebase awareness.

Harness Gap

Chapter Two

What is an
agent harness?

The model + harness equation.

§ 02.01 The equation

Agent = Model + Harness

A harness is everything around the model.

A harness is every piece of code, configuration, and execution logic that wraps a model to turn it into a useful agent.

System prompts — shape agent behavior
Tools, skills & MCPs — capabilities
Execution environment — filesystem, bash, sandbox
Orchestration — subagent spawning, routing
Memory & context management — compaction, persistence
Hooks & middleware — linting, drift checks

See The Anatomy of an Agent Harness by LangChain for a deeper breakdown.

Agent

Harness

System Prompts
Tools, Skills & MCPs
Execution Environment
Orchestration Logic
Memory & Context Mgmt
Hooks & Middleware

MODEL

§ 02.02 Harness layers

Layered architecture

The harness stack.

Agent harnesses have layers. Different providers handle different layers.

Execution Layer

Claude Code / Codex / Cursor · Filesystem, Bash, Sandbox, Browser, Git

+

Intelligence Infrastructure Us

ensemble-mcp · Memory, Skills, Drift, Routing, Compression, Sessions, Codebase Indexing

+

Orchestration

Ensemble 7-agent pipeline · Captain · Scope · Craft · Forge · Lens · Signal

MODEL

Chapter Three

What is
Ensemble mcp?

The harness infrastructure layer.

§ 03.01 The intelligence layer

The definition

A harness infrastructure layer delivered as a local Python MCP server.

It extends any agent harness with memory, skills, drift detection, and intelligence — without a single external API call.

Adds the intelligence layer

Your AI tool already has a harness (filesystem, bash, sandbox). ensemble-mcp adds the intelligence layer that makes it learn, stay on task, and work smarter over time.

Invisible by design

The developer never types an ensemble-mcp command. The AI agent calls its harness tools automatically in the background.

Key facts

Role	Harness infrastructure
Language	Python 3.11+
Protocol	MCP (Model Context Protocol)
Harness tools	19 · 8 categories
LLM calls	Zero
Storage	SQLite (WAL mode)
Embeddings	ONNX · ~5ms
Install	uvx ensemble-mcp
Size	~90 MB
Tests	573 passing
License	MIT

Chapter Four

19 harness tools
in action.

The primitives: memory · drift · routing · indexing · skills · compression.

§ 04.01 Primitives map

Harness primitives at a glance

Nineteen primitives, eight categories, one envelope.

Group I

Memory & Session

Continual learning. Long-horizon execution.

Memory & Search

patterns_search
patterns_store
patterns_prune

Session Persistence

session_save
session_load
session_search

Group II

Intelligence

Self-verification, routing, progressive disclosure.

Self-Verification

drift_check

Model Routing

model_recommend

Skills (Progressive Disclosure)

skills_discover
skills_suggest
skills_generate

Group III

Context & Codebase

Context rot prevention + codebase awareness.

Context Rot Prevention

context_compress
context_prepare

Codebase Awareness

project_index
project_query
project_dependencies
project_snapshot

§ 04.02 Memory

Continual learning

Pattern memory.

Agents remember what worked before — the harness enables continual learning.

384-dim vectors — text embedded via ONNX MiniLM-L6-v2
SQLite-backed — stored alongside rich metadata
Semantic search via cosine similarity — not keyword matching
~5ms per embedding · <1ms per search

graph LR S1["Session 1:
patterns_store('laravel auth setup', ...)"] --> DB[(SQLite Vector Store)] DB --> S5["Session 5:
patterns_search('authentication')"] S5 --> R["Returns Session 1's
approach instantly"] classDef db fill:#f59e0b,stroke:#f59e0b,color:#fff,rx:4 classDef ret fill:#10b981,stroke:#10b981,color:#fff,rx:4 class DB db class R ret

§ 04.03 Self-verification

Self-verification loop

Drift detection.

A harness primitive that catches agents going off-task before damage is done.

Aligned

Task:    "Add user authentication"
Changes: auth controllers, login views

→ score:   0.12
→ verdict: "aligned"

Significant Drift

Task:    "Add user authentication"
Changes: blog system, payment gateway

→ score:   0.78
→ verdict: "significant_drift"
→ Agent warned before continuing

Verdict scale

Score < 0.25 aligned

Proceed normally

0.25 – 0.59 minor_drift

Log a warning

Score ≥ 0.60 significant_drift

Intervention required

§ 04.04 Routing

Right model for the right job

Smart model routing.

Stop paying premium prices for simple tasks.

Agent	Trivial	Simple	Standard	Complex
Signal · Git	cheapest	cheapest	cheapest	cheapest
Forge · Test	cheapest	cheapest	mid	mid
Lens · Review	cheapest	cheapest	mid	mid
Craft · Code	mid	mid	best	best
Scope · Plan	mid	mid	best	best

A typo fix doesn't need the same model as a new microservice architecture.

§ 04.05 Indexing

Index once, query instantly

Codebase indexing.

Stop re-exploring the same files on every single run.

30+ languages detected by extension
12 role categories — test, migration, config, model, controller, service…
Exported symbols with signatures and docstrings
Import/dependency graph across the project
Respects .gitignore patterns
Incremental updates via file mtime

Tools

`project_index`	Build / refresh index
`project_query`	Query by language, path, text
`project_dependencies`	Import / dependency graph
`project_snapshot`	Compact project summary

Performance

Index 1K files	< 5s
Index 10K files	< 30s
Incremental (10 files)	< 1s
Query response	< 5ms

§ 04.06 Progressive disclosure

Patterns graduate into skills

Skill intelligence — progressive disclosure.

A harness primitive that prevents context rot by loading only what's needed.

graph LR P["patterns_store
after each pipeline"] --> A["Accumulate
20+ patterns"] A --> C["Cluster detected
cosine sim ≥ 0.75"] C --> S["skills_suggest
confidence: 0.87"] S --> U{User decision} U -->|Accept| G["skills_generate
.ai/skills/skill.md"] U -->|Dismiss| D["Permanently
suppressed"] U -->|Defer| P G --> L["skills_discover
auto-loads in future
sessions"] classDef gen fill:#10b981,stroke:#10b981,color:#fff,rx:4 classDef sug fill:#3b82f6,stroke:#3b82f6,color:#fff,rx:4 class G gen class S sug

Result

Every future project starts with your team's learned knowledge — no re-explaining needed.

§ 04.07 Long-horizon execution

Session persistence & context rot prevention

Resume, and say less.

Long-horizon execution

Crash-proof pipeline execution — the harness maintains durable state across context windows.

session_save — checkpoint with optimistic versioning
session_load — load latest or specific checkpoint
session_search — find past sessions by semantic similarity

Context rot prevention

Context rot degrades model performance as the context window fills up — these tools fight it.

Input

"I'd be happy to help! So basically, in order to make use of the API, you'll definitely need to take into consideration the auth requirements."

↓

Output

"To use API, consider auth requirements."

→ 23 tokens → 7 tokens · 70% reduction

Preserved: code blocks URLs paths tables

Compressed: filler pleasantries hedging verbose phrases

Chapter Five

Zero-LLM
architecture.

Everything runs locally. Nothing leaves your machine.

§ 05.01 Local-only

Zero external API calls

Everything runs locally.

ensemble-mcp makes zero external API calls. All intelligence runs locally on your machine.

Component	Technology	Speed
Embeddings	ONNX Runtime + MiniLM-L6-v2	~5ms per text
Vector search	numpy cosine similarity	<1ms per query
Storage	SQLite (WAL mode)	<5ms per op
Compression	Rule-based regex engine	<2ms per text

Traditional approach

Every operation → API call
Latency: 500ms – 2000ms
Cost: $0.001 – $0.01 each
Requires internet
Data leaves your machine

ensemble-mcp approach

Every operation → LOCAL
Latency: 1ms – 10ms
Cost: $0.00
Works offline
Data stays local

§ 05.02 Economics

Near-zero marginal cost

The economics of running nothing.

Cost component	Per user / month
Compute (MCP server)	$0.00
LLM API calls	$0.00
ONNX model serving	$0.00
Data storage	$0.00
Total COGS	$0.00

~97%

gross margin on any paid tier — achievable from day one

Chapter Six

Token & cost
reduction.

Five mechanisms that save money.

§ 06.01 Mechanisms 1 & 2

Pattern memory · codebase indexing

Pattern memory & indexing.

Mechanism 1 · Pattern memory

~15–25% savings on pattern context

Without ensemble-mcp

Every session:
  Agent reads 30 pattern entries
  → ~8,000 tokens consumed

With ensemble-mcp

Every session:
  Agent queries top-3 patterns
  → ~800 tokens consumed

Savings: ~90% → ~$8.10/dev/mo

Mechanism 2 · Codebase indexing

~20–40% savings on codebase exploration

Without ensemble-mcp

Agent explores:
  glob("**/*.php")
  grep("class.*Controller")
  read file by file…

→ ~4-6K tokens per cycle

With ensemble-mcp

Agent queries:
  project_query(
    query="TodoController",
    file_types=["php"]
  )

→ ~700 tokens · ~$4-6/mo

§ 06.02 Mechanism 3

Smart model routing

Tier the task,
save the budget.

~30–60% cost savings by using the right tier for each task.

Without ensemble-mcp

Every task → Claude Opus
  $15/M input, $75/M output

Including:
  ✗ Typo fixes   → Opus
  ✗ Test runs    → Opus
  ✗ Git commits  → Opus

With ensemble-mcp

Complex tasks → Opus   $15/M
Simple tasks  → Sonnet $3/M   · 80% cheaper
Trivial tasks → Haiku  $0.25/M · 98% cheaper

§ 06.03 Mechanisms 4 & 5

Compression + caching

Compression & caching.

Context compression

What gets compressed:

• Filler — "just", "really", "basically"
• Pleasantries — "I'd be happy to help!"
• Verbose — "in order to" → "to"
• Hedging — "I think", "it seems"

What stays untouched:

• Code blocks, URLs, file paths
• Headings, tables, technical content

Result: ~10–23% fewer tokens.

Prompt cache optimization

Static

System prompt, rules — always same

Cached

Project

Conventions, structure — rarely changes

Often Cached

Task

Current request, diff — changes each time

Not Cached

context_prepare maximizes the stable prefix so LLM providers cache more tokens.

§ 06.04 The bottom line

Total cost savings

The cumulative effect.

Per developer · monthly

Savings source	Mechanism	Monthly
Pattern memory	Top-3 vs full dump	~$8.10
Codebase indexing	Query vs exploration	~$4–6
Model routing	Right tier per task	Variable
Compression	Rule-based	10–23%
Prompt caching	Section ordering	Variable
Total		$12–18+

At scale · annually

Team size	Low	High
10 developers	$1,440	$2,160
50 developers	$7,200	$10,800
100 developers	$14,400	$21,600
500 developers	$72,000	$108,000

Chapter Seven

Live
workflow.

What happens behind the scenes.

§ 07.01 Then & now

Two sessions, side by side

The flywheel in motion.

Session 1 New project

"Set up a smart todo app with auth, CRUD, and auto-categorization."

model_recommend — Recommends "mid" tier (Sonnet) · saves cost
patterns_search — Empty on first project
project_index — Indexes the Laravel scaffold · 200ms
Agent writes code — Uses indexed structure
drift_check — Score 0.12 ✓
patterns_store — Saves for future use

Session 5 New project, similar task

"Set up a recipe app with auth and CRUD."

patterns_search — Finds "laravel todo crud setup"
Agent applies — Already knows: Breeze, service classes
Faster output — No re-explaining needed

The flywheel effect

More sessions → More patterns → Better matching → Faster dev → More patterns → ...

After 20+ patterns: Skills auto-detected → Permanent project skills created → Every future project starts with learned knowledge.

Chapter Eight

Technical
architecture.

The harness under the hood.

§ 08.01 Layer architecture

Harness layer architecture

Four layers. One contract.

Harness layer	Provider	Primitives
Execution	Claude Code / Codex / Cursor	Filesystem, Bash, Sandbox, Browser, Git
Intelligence	ensemble-mcp	Memory, Skills, Drift, Routing, Compression, Sessions, Indexing
Orchestration	Ensemble Pipeline	Captain, Scope, Craft, Forge, Lens, Signal, Trace
Model	Claude / GPT / Gemini	Raw intelligence (text in → text out)

ensemble-mcp is harness-agnostic — it plugs into any MCP-compatible agent via the standard MCP protocol. Not tied to any specific execution environment.

§ 08.02 System overview

Clients · Server · Storage

Six harnesses. One intelligence layer.

graph TB subgraph "Agent Harnesses (Execution Layer)" C1[OpenCode] C2[Claude Code] C3[Copilot] C4[Cursor] C5[Windsurf] C6[Devin] end subgraph "ensemble-mcp (Intelligence Infrastructure)" SRV[Server — Tool Dispatch] SRV --> PAT["patterns.py (3)"] SRV --> DFT["drift.py (1)"] SRV --> RTG["routing.py (1)"] SRV --> SKL["skills.py (3)"] SRV --> SES["session.py (3)"] SRV --> IDX["indexer.py (4)"] SRV --> CMP["compress.py (2)"] SRV --> UTL["health + reset (2)"] end subgraph "Local Storage" DB[(SQLite WAL)] MDL[ONNX Model 22MB] end C1 --> SRV C2 --> SRV C3 --> SRV C4 --> SRV C5 --> SRV C6 --> SRV PAT --> DB DFT --> DB SKL --> DB SES --> DB IDX --> DB PAT --> MDL DFT --> MDL SKL --> MDL classDef srv fill:#10b981,stroke:#10b981,color:#fff,rx:4 classDef db fill:#f59e0b,stroke:#f59e0b,color:#fff,rx:4 classDef mdl fill:#8b5cf6,stroke:#8b5cf6,color:#fff,rx:4 class SRV srv class DB db class MDL mdl

§ 08.03 Contracts

Response envelope & error taxonomy

One envelope. Every tool.

Every tool returns this

{
  "ok": true,
  "data": { "...payload..." },
  "error": null,
  "meta": {
    "duration_ms": 12,
    "source": "sqlite",
    "confidence": "exact"
  }
}

Confidence: exact · partial · estimated

Structured error codes

Category	Retry?
`VALIDATION_*`	Never
`NOT_FOUND_*`	Never
`CONFLICT_*`	After refresh
`TIMEOUT_*`	With backoff
`IO_*`	With backoff
`INTERNAL_*`	If marked

Every error has a code, retry guidance, and structured details.

§ 08.04 Security

Local-first by design

Security.

Secrets never leave the machine. Boundaries are explicit.

Secret redaction

9 regex patterns scan all text before storage — AWS keys, Bearer tokens, API keys, GitHub tokens, passwords — all replaced with [REDACTED].
Trust boundaries

Data is classified by source — local_state (trusted) · client_input (validated) · filesystem_scan (read-only).
Local-only dashboard

Binds to 127.0.0.1, never exposed to the network.
DOMPurify sanitization

All rendered markdown is XSS-sanitized before it ever hits the DOM.
Destructive operations

Require an explicit confirm=true flag. No accidents.

Chapter Nine

Dashboard &
observability.

Visualize everything at localhost:8787.

§ 09.01 Web dashboard

Web dashboard

A window into the living memory.

$ ensemble-mcp web # Opens browser to localhost:8787

$ ensemble-mcp web --port 9000 # Custom port

Page	What it shows
Overview	Summary cards, drift trend chart, recent activity feed
Patterns	All stored patterns with match counts, search, filtering
Skills	Pending suggestions with confidence scores, stale detection
Projects	Indexed projects with language pie charts, role bar charts
Drift	Drift check history with scores, verdicts, flagged files
Sessions	Session list with lifecycle status, step-by-step detail
Reports	Bug Hunter scan results, health trend charts

Alpine.js Chart.js Zero build step Kinetic Architect design system 11+ JSON API endpoints Same SQLite DB · WAL no contention

Chapter Ten

One command
to get started.

Auto-detected. Auto-registered. No friction.

§ 10.01 Getting started

Getting started

Install · register · launch.

Install & register

# Install the package

$ pip install ensemble-mcp

# Or run directly (no install needed)

$ uvx ensemble-mcp

# Auto-detect AI tools and register

$ ensemble-mcp install

# Launch the dashboard

$ ensemble-mcp web

Smart command detection

Priority	Detection	Registered
1st	`ensemble-mcp` on PATH	ensemble-mcp
2nd	`uvx` available	uvx ensemble-mcp
3rd	Neither	python -m ensemble_mcp

Supported AI tools

OpenCodeAuto-install
Claude CodeAuto-install
GitHub Copilot (VS Code)Auto-install
CursorAuto-install
WindsurfAuto-install
Devin CLIAuto-install

That's it.

The AI agent will call ensemble-mcp tools automatically. No further configuration needed.

Chapter Eleven

Future
roadmap.

What's done. What's next. What's intentionally deferred.

§ 11.01 Roadmap

What's done, what's next

Roadmap.

Completed

19 MCP tools (memory, drift, routing, skills, sessions, indexer, compress)
Web Dashboard with Kinetic Architect redesign
Skill Intelligence (pattern-to-skill auto-graduation)
Auto-installer for 6 AI tools
Context compression + prompt caching
Bug Hunter reports dashboard
Dashboard v2 (management UI)
573 tests passing

Coming next

Feature	Priority
Embedding Model Upgrade (512 tokens)	Medium
Real-Time Live View (WebSocket)	Medium
Plugin System	Low
Advanced Indexing (tree-sitter)	Low

§ 11.02 Scaling

Scaling path

Right-sized for today.

With a clear map of where it grows next.

Scale	Files	Status
Small project	< 10K	✓ Fully supported, optimal
Medium project	10K – 100K	✓ Supported with minor tuning
Large monorepo	100K – 1M	⚠ Needs FAISS, parallel indexing
Enterprise	1M+	◊ Future — PostgreSQL, ANN, workers

Intentionally deferred

— No FAISS/Qdrant in v1 — numpy is perfect for < 10K vectors
— No PostgreSQL — SQLite is right for local storage

— No distributed architecture — local model is correct
— No premature abstraction — interfaces come with 2nd backend

The current design is not wrong — it's correctly scoped. This documents the upgrade path for when scale demands change.

Chapter Twelve

Key
takeaways.

Five pillars. Five principles. One thesis.

§ 12.01 The harness pillars

The harness pillars

Memory · Self-Verification · Orchestration · Context Rot · Progressive Disclosure.

01

Memory

Continual learning across sessions

02

Self-Verification

Drift detection keeps agents on task

03

Orchestration

Right model for the right job

04

Context Rot

Fewer tokens, lower cost

05

Progressive Disclosure

Patterns → Skills → Institutional AI

Harness design principles

Zero-LLM-Call — The harness infrastructure never calls external APIs
Local-First — All data stays on the developer's machine
Harness-Agnostic — Works with any MCP-compatible agent harness
Progressive Disclosure — Load only task-relevant skills and context
Contract-First — All tools use {ok, data, error, meta} envelope

Ready when you are

Get started.

pip install ensemble-mcp && ensemble-mcp install && ensemble-mcp web

GitHub Docs MIT License

Python 3.11+ · ONNX Runtime · SQLite · 573 tests · 19 harness tools · Zero external API calls

Ensemble mcp

Twelve chapters, one story.

From the harness concept to live installation.

The problem.

Every session starts from zero.

The Developer Experience

The four core problems

No Memory

Silent Drift

Static Routing

Redundant Exploration

What is anagent harness?

A harness is everything around the model.

The harness stack.

Agent harnesses have layers. Different providers handle different layers.

What isEnsemble mcp?

A harness infrastructure layer delivered as a local Python MCP server.

Adds the intelligence layer

Invisible by design

19 harness toolsin action.

Nineteen primitives, eight categories, one envelope.

Memory & Session

Intelligence

Context & Codebase

Pattern memory.

Agents remember what worked before — the harness enables continual learning.

Drift detection.

A harness primitive that catches agents going off-task before damage is done.

Aligned

Significant Drift

Verdict scale

Smart model routing.

Stop paying premium prices for simple tasks.

Codebase indexing.

Stop re-exploring the same files on every single run.

Tools

Performance

Skill intelligence — progressive disclosure.

A harness primitive that prevents context rot by loading only what's needed.

Result

Resume, and say less.

Long-horizon execution

Context rot prevention

Zero-LLMarchitecture.

Everything runs locally.

Traditional approach

ensemble-mcp approach

The economics of running nothing.

Token & costreduction.

Pattern memory & indexing.

Mechanism 1 · Pattern memory

Without ensemble-mcp

With ensemble-mcp

Mechanism 2 · Codebase indexing

Without ensemble-mcp

With ensemble-mcp

Tier the task,save the budget.

~30–60% cost savings by using the right tier for each task.

Without ensemble-mcp

With ensemble-mcp

Compression & caching.

Context compression

Prompt cache optimization

The cumulative effect.

Per developer · monthly

At scale · annually

Liveworkflow.

The flywheel in motion.

Session 1 New project

Session 5 New project, similar task

The flywheel effect

Technicalarchitecture.

Four layers. One contract.

Six harnesses. One intelligence layer.

One envelope. Every tool.

Every tool returns this

Structured error codes

Security.

Secrets never leave the machine. Boundaries are explicit.

Secret redaction

What is an
agent harness?

What is
Ensemble mcp?

19 harness tools
in action.

Zero-LLM
architecture.

Token & cost
reduction.

Tier the task,
save the budget.

Live
workflow.

Technical
architecture.

Dashboard &
observability.

One command
to get started.

Future
roadmap.

Key
takeaways.