Proof
The proof is the part you can run. Seed a small meeting corpus, ask a question where the old answer is wrong, then open the markdown files behind the answer. That is the bet: your meeting memory should be inspectable, not just confidently summarized.
First, install the bundled demo corpus. It writes five fictional meetings to ~/.minutes/demo/ and prints the MCP config for your agent host.
Then ask:
The answer should catch the reversal
The current pricing decision is annual-only. The February meeting tested monthly billing for three consultant signups. The March follow-up reversed that decision after only four signups and worse churn. The March file explicitly supersedes February.
That is the point. The agent is not hallucinating from a summary blob; it is walking the same receipts you can open.
60-second demo
Runnable nownpx minutes-mcp --demo installs a five-meeting fixture corpus into ~/.minutes/demo/ and prints an MCP config pointed at that corpus. A new evaluator can try search and recall without donating a real meeting first.
Try it
Agent eval v0.1
Smoke testThe current eval has 10 fictional meeting files, 20 maintainer-authored questions, a runner, and a provisional Claude-on-Claude 20/20 pre-grade. The harness runs and gives skeptics something concrete to poke at; it is not independent benchmark evidence.
Read results
Reference adapters
Baseline examplesMem0 and Graphiti adapters show how Minutes markdown maps into external memory systems. They are intentionally small examples, not a supported SDK. Identity-aware ingestion, idempotency, and pinned adapter tests are the next v2 milestone.
See adapters
v0.1 is useful, but it is not a category benchmark. The corpus, questions, and rubrics are maintainer-authored, and the published grade is same-family model pre-grading with human sign-off still pending.
The reference adapters show the file contract works, but v2 still has work to do: identity mapping, idempotency, pinned dependencies, and CI dry-run coverage.
Eval v0.2
Multi-corpus questions, blind-authored holdouts, hallucination traps, noisy transcript variants, multi-model runs, and head-to-head baselines.
Adapter v2
Per-attendee identity mapping, duplicate-safe manifests, exact version pins, CI dry-runs, and simpler Graphiti setup paths.
Human review
Independent human sign-off on eval runs before any result is treated as more than a provisional smoke test.
Why this matters
Most meeting tools sell the polished summary. Minutes is about the layer underneath: source material your agents can read and you can audit. If the demo feels obvious, the product is the same loop pointed at your real meetings.