Promises don't serve the public. Performance does.

Public Bench produces transparent, peer-validated report cards that public servants can use to procure, manage, and trust the AI their cities depend on.

Run a benchmark → Read the methodology

Free · Nominate a city and receive benchmark results within 1–2 weeks.

Mission

We exist to close a structural gap.

Hundreds of municipal governments are now contracting AI for 311, permitting, report writing, and code enforcement — without a shared way to assess quality, and without leverage to hold vendors accountable when systems fail residents.

Public Bench designs, runs, and publishes performance tests for the AI systems cities are buying. We produce transparent, defensible report cards that procurement officers, CIOs, and elected officials can use to make better decisions.

Every benchmark is independent, peer validated, and open by default.

How it works

Transparent, peer-validated report cards for AI in government.

Step 1

Ground Truthing

Validate emerging use cases by evaluating AI products in partnership with cohort cities.

→

Step 2

Peer Validation

Convene roundtables of government experts and SMEs to establish performance standards for validated use cases.

→

Step 3

Build Evaluation Infrastructure

Turn peer-validated standards into technical benchmarks, consumer reports, and public leaderboards.

→

Step 4

Scale through Procurement Tools

Bake benchmarks into plug-and-play procurement packages and shareable contracts governments can easily use.

Theory of Change

A causal chain from independent standards to a healthier market.

Build independent performance standards

If we establish trusted, independent performance standards for how AI should perform in municipal contexts…

Establish evaluation infrastructure

…and we build the evaluation infrastructure that makes those standards visible, comparable, and actionable…

Scale through procurement

…and we give governments the procurement tools based on these standards to buy responsibly at scale…

★

Shift markets towards Responsible AI

…then we shift the market towards AI that actually works for everyone.

Get started

See how your city's AI tools perform.

Select your use case to begin. More benchmarks are added as roundtables complete and testing protocols are validated.

311 Chatbots

01 — Setup

Nominate your city.

Provide your contact information and answer a short set of policy questions specific to your use case. Your answers become the ground truth for scoring.

02 — Benchmark

Testing scenarios run automatically

Our pipeline tests your system across task performance, safety, and accessibility scenarios. Automated with flags for human review.

03 — Report

A–F grade with full breakdown

A PDF report card with an overall grade, dimension scores, flagged items, and methodology disclosure. Shareable with leadership.

Get started →