Roadmap

Stage and plan for EvalOps Workbench

GitHub
Current stage
Where this project sits today.
Researching
Now — Showcase deploy
now
The Vercel deploy is a public landing page with a public telemetry endpoint. The API skeleton is documented; production workload is not yet routed through it.
  • Static landing page is live at vercel.app
  • Public /api/stats Tier-B telemetry endpoint
  • GitHub-derived metrics behind a 5-minute cache
  • TELEMETRY_SCHEMA.md as the public contract
Next — MVP build
next
Implement the four MVP commitments. After this phase the system is promoted from "showcase" to "live" and the telemetry contract upgrades to Tier-A.
  • Load datasets from JSON or CSV
  • Run prompt or agent variants
  • Score outputs with rubric functions
  • Compare runs and export regressions
Later — Production graduation
later
Once the MVP runs real workload, the dashboard upgrades to Tier-A telemetry with workload counters (per-system metric set), middleware-recorded query/run logs, and Postgres-persisted aggregations.
  • Tier-A live telemetry replaces Tier-B GitHub-derived metrics
  • Postgres persistence for workload counters
  • Middleware-driven recording with privacy invariants
  • Audit trail surfaced in the dashboard