Automated ground truth for agents.

Point poseur at a URL. Describe the visitor, the task, the feeling you want to track. Synthetic users visit the site and narrate what happened — ending with a structured signal you can regress on every deploy.

This is what poseur produces

A roleplaying synthetic visitor with a specific persona, specific stakes, and a specific emotional signal to track — producing a timestamped inner monologue as they use your website, ending with a structured signal line.

URLhttps://example-ai.com Personaskeptical developer, first visit, no prior context Signaltrust — does this page earn a demo booking in under a minute?
[00:05] Page loads. Headline: "AI for your team." Vague.
[00:15] Scrolling. Testimonials from companies I don't recognize. Not a trust signal.
[00:28] Still unclear what this actually does. No screenshot, no demo link.
[00:40] "Book a demo" — the only CTA. I don't book demos for things I don't understand yet.
[00:52] Back button. I'm out.
[SIGNAL] decreased; moderate; the page asked for a demo booking before earning enough trust to justify one

That [SIGNAL] line is structured. Direction, intensity, key moment. Aggregate across hundreds of runs, plot over time, diff across deploys. A qualitative signal you can regress on.

What your current evals miss

Unit tests say pass

But the landing page buries pricing. Your visitor bounces. Your tests don't care.

Thumbs-up says 👍

A scalar. You don't know what the visitor liked, hated, or silently stopped trusting.

LLM-as-judge says 8/10

An AI graded your AI against a rubric. Whose taste is that, exactly?

Human evals say good

Three weeks ago, in a scheduled session you can't re-run on today's deploy.

Three inputs, one output

Environment

The visitor's context. Device, browser, network, session state. Plain text.

Scenario

What they're trying to do on your site. Who they are, when they'd give up.

Signal

The emotional dimension that matters to your visitors.

Who it's for

Teams shipping AI agents whose web-facing surface is the product — landing pages, chat UIs, onboarding flows, playgrounds. Wire poseur into CI. Know whether your new prompt, new model, or new layout actually landed.

Also useful for: product teams scanning their own public sites for friction real users feel but don't report.

Poseur only scans sites you've proven you own. Verification is one file upload away.

Get started

Poseur scans public websites you own. Sign in with Google, verify a domain, and watch the experience log stream back.