Roadmap
Stage and plan for EvalOps Workbench
Current stage
Where this project sits today.
Researching
Now — Showcase deploy
now
The Vercel deploy is a public landing page with a public telemetry endpoint. The API skeleton is documented; production workload is not yet routed through it.
- Static landing page is live at vercel.app
- Public /api/stats Tier-B telemetry endpoint
- GitHub-derived metrics behind a 5-minute cache
- TELEMETRY_SCHEMA.md as the public contract
Next — MVP build
next
Implement the four MVP commitments. After this phase the system is promoted from "showcase" to "live" and the telemetry contract upgrades to Tier-A.
- Load datasets from JSON or CSV
- Run prompt or agent variants
- Score outputs with rubric functions
- Compare runs and export regressions
Later — Production graduation
later
Once the MVP runs real workload, the dashboard upgrades to Tier-A telemetry with workload counters (per-system metric set), middleware-recorded query/run logs, and Postgres-persisted aggregations.
- Tier-A live telemetry replaces Tier-B GitHub-derived metrics
- Postgres persistence for workload counters
- Middleware-driven recording with privacy invariants
- Audit trail surfaced in the dashboard