About

Building the accountability infrastructure AI procurement is missing.

Public Bench is an independent benchmarking organization for AI in municipal government. Noncommercial, vendor-neutral, and open by default.

Mission

We exist to close a structural gap.

Hundreds of municipal governments are now contracting AI for 311, permitting, report writing, and code enforcement — without a shared way to assess quality, and without leverage to hold vendors accountable when systems fail residents.

Public Bench designs, runs, and publishes performance tests for the AI systems cities are buying. We produce transparent, defensible report cards that procurement officers, CIOs, and elected officials can use to make better decisions.

We are not a consultant. We do not accept payment from vendors. We do not grade our own work. Our methodology is public and our code is open source.

Approach

Three principles.

01

Independent

No vendor relationships. No self-grading. Judge models are always different from the system under test. We have no financial stake in any benchmark outcome.

02

Peer-validated

Rubrics are built with the practitioners who do the work — 311 operators, procurement officers, attorneys, police. Not with vendors, not in isolation.

03

Open by default

Methodology, rubrics, and judge prompts are public. Cities can rerun every benchmark themselves, for free, using our open-source code.

Current Status

Where we are.

H1

Now → 18 mo

Prove the model

311 benchmark live. Open-source code released. First cities benchmarked.

H2

2 – 4 yrs

Build the library

A growing catalog of expert-validated benchmarks embedded in procurement workflows.

H3

5+ yrs

Shift the market

Performance standards, cooperative purchasing, and contracts that make trustworthy AI the default in government.

Get in touch.

If you're a city government, researcher, philanthropic funder, or cooperative purchasing administrator, we'd like to hear from you.

hello@publicbench.org →