Proof

Run the demo. Inspect the receipts.

The proof is the part you can run. Seed a small meeting corpus, ask a question where the old answer is wrong, then open the markdown files behind the answer. That is the bet: your meeting memory should be inspectable, not just confidently summarized.

Try the proof loop

First, install the bundled demo corpus. It writes five fictional meetings to ~/.minutes/demo/ and prints the MCP config for your agent host.

npx minutes-mcp --demo

Then ask:

What did we decide about pricing, and which decision is current?

The answer should catch the reversal

The current pricing decision is annual-only. The February meeting tested monthly billing for three consultant signups. The March follow-up reversed that decision after only four signups and worse churn. The March file explicitly supersedes February.

That is the point. The agent is not hallucinating from a summary blob; it is walking the same receipts you can open.

Inspect the source

2026-02-28

monthly billing launched

open markdown

2026-03-25

monthly billing reversed

open markdown

2026-04-17

Q2 priorities locked

open markdown

Current evidence

60-second demo

Runnable now

npx minutes-mcp --demo installs a five-meeting fixture corpus into ~/.minutes/demo/ and prints an MCP config pointed at that corpus. A new evaluator can try search and recall without donating a real meeting first.

Try it

Agent eval v0.1

Smoke test

The current eval has 10 fictional meeting files, 20 maintainer-authored questions, a runner, and a provisional Claude-on-Claude 20/20 pre-grade. The harness runs and gives skeptics something concrete to poke at; it is not independent benchmark evidence.

Read results

Reference adapters

Baseline examples

Mem0 and Graphiti adapters show how Minutes markdown maps into external memory systems. They are intentionally small examples, not a supported SDK. Identity-aware ingestion, idempotency, and pinned adapter tests are the next v2 milestone.

See adapters

What not to overclaim

v0.1 is useful, but it is not a category benchmark. The corpus, questions, and rubrics are maintainer-authored, and the published grade is same-family model pre-grading with human sign-off still pending.

The reference adapters show the file contract works, but v2 still has work to do: identity mapping, idempotency, pinned dependencies, and CI dry-run coverage.

Next proof milestones

Eval v0.2

Multi-corpus questions, blind-authored holdouts, hallucination traps, noisy transcript variants, multi-model runs, and head-to-head baselines.

Adapter v2

Per-attendee identity mapping, duplicate-safe manifests, exact version pins, CI dry-runs, and simpler Graphiti setup paths.

Human review

Independent human sign-off on eval runs before any result is treated as more than a provisional smoke test.

Why this matters

Most meeting tools sell the polished summary. Minutes is about the layer underneath: source material your agents can read and you can audit. If the demo feels obvious, the product is the same loop pointed at your real meetings.

Try the demo Audit v0.1