Independent AI Benchmarking · Municipal Government
The AI cities buy in the next five years will define public services for the next twenty.
Public Bench produces transparent, peer-validated report cards that public servants can use to procure, manage, and trust the AI their cities depend on.
Free · Takes about 10–15 minutes · PDF report delivered by email
Theory of Change
A three-horizon path from one benchmark to a national standard.
Now → 18 mo
Prove the model
One benchmark, one open-source release, end-to-end credibility on 311.
2 – 4 yrs
Build the library
A growing catalog of expert-validated benchmarks embedded in procurement workflows.
5+ yrs
Shift the market
Performance standards, cooperative purchasing, and shareable contracts that make trustworthy AI the default in government.
How it works
A reproducible pipeline. The same code runs every benchmark we ship.
Stage 0
Roundtable
Public servants convene to define the use case, validate test scenarios, and prove the benchmark discriminates meaningfully.
Volunteer → →Stage A
Test Suite
Domain-specific YAML items: scenario, ground truth, rubric, judge prompt.
→Stage B
Proctor
Submits inputs to the AI under evaluation; collects verbatim outputs.
→Stage C
Judge
A separate LLM scores outputs against the rubric and reports confidence.
→Stage D
Reporter
Aggregates by dimension and risk; produces an A–F report card.
Get started
From setup to report card in minutes.
Select your use case to begin. More benchmarks are added as roundtables complete and testing protocols are validated.
01 — Setup
Tell us about your city
Provide your contact information and answer a short set of policy questions specific to your use case. Your answers become the ground truth for scoring.
02 — Benchmark
Testing scenarios run automatically
Our pipeline tests your system across task performance, safety, and accessibility scenarios. Automated with flags for human review.
03 — Report
A–F grade with full breakdown
A PDF report card with an overall grade, dimension scores, flagged items, and methodology disclosure. Shareable with leadership.