Dani Weidner — Validation & Experimentation

THREAD 01 / 04

Comparative prototype testing

Old vs new Rated head-to-head Customer sessions Quantified lift

Fig. — New tested against current

The problem

A new drag-and-drop creative builder was meant to replace a workflow where users uploaded full ad creatives as flat images and hand-edited CSS to make changes. "Better" was easy to assert and hard to prove. I needed evidence that the new approach actually beat the old one for the people who'd use it.

What I did

I ran head-to-head testing with enterprise customers, asking them to rate the new builder directly against the system they used today. The gap was decisive: the new builder scored in the high range while the current system sat near the bottom, and one media customer called it a thousand-percent improvement. I captured the qualitative wins too, the ability to test different headlines and calls to action without rebuilding an entire creative, so the rating was backed by specific reasons, not just a number.

A thousand-percent improvement on the current system. Enterprise customer, media & publishing

2×

New builder roughly doubled the old system's usability rating head-to-head

2

Independent customer sessions converged on the same verdict

1

Clear winner sent to build with the comparative evidence behind it

THREAD 02 / 04

A/B test flow design

Experiment UI Acceptance criteria Customer-validated Eng-ready

Fig. — Validated against criteria

The problem

The platform needed a built-in A/B testing flow so marketers could test creative variants. Experimentation UI is unforgiving: traffic logic, locking, and the control case all have to behave exactly right, and the people who'd use it run tests for a living and would spot anything that didn't match how experiments actually work.

What I did

I designed the flow against a full set of acceptance criteria and validated it directly with customers who run experiments daily. Sessions confirmed the core interaction decisions, inline traffic editing, lock behavior, and a split-traffic-equally action, and surfaced a subtle issue where applying a setting at the wrong level would corrupt the metrics, which I routed to engineering as a spike before it could ship. The flow was reviewed and validated live with a customer ahead of its release.

Tests one variable at a time, creative or settings, never both at once. Customer experimentation practice, validated in session

17

Acceptance criteria the flow was designed and reviewed against

3

Core interactions validated: inline traffic, lock, split equally

1

Metric-corrupting edge case caught in validation and sent to a spike

THREAD 03 / 04

Holdout group requirements

Workaround analysis Requirements validation Preference research Architecture decision

Fig. — A clean holdout slice

The problem

To measure lift, customers needed a holdout group, a slice of traffic that sees nothing. There was no real holdout feature, so customers faked one with invisible variants set to most of the traffic. That hack caused scroll-lock bugs, CSS conflicts, and support tickets. The feature had to be designed from what people were actually doing to cope.

What I did

I documented the workaround in detail and turned it into requirements for a dedicated holdout: a small default, pixel-based, rendering no creative at all, so measurement stayed clean without the side effects. I validated the surrounding architecture question, whether dismissal settings belong at the campaign or variant level, with multiple customers, and let the split in their answers drive a configurable default rather than forcing one opinion on everyone.

Settings are more precise and aligned with business goals. Customers test creatives more than they test settings. Preference research, holdout architecture

2/3

Customers preferred campaign-level settings, which set the default

1

Dedicated holdout replacing a bug-prone invisible-variant workaround

0

Rendered creative in the holdout: measurement stays clean by design

THREAD 04 / 04

Assumption mapping

Internal alignment Surfacing disagreement Evidence plan De-risking

Fig. — Assumptions made visible

The problem

Before validating a design with customers, the team had to agree on what it even believed. A core system concept was being built on assumptions nobody had checked, and quiet internal disagreement about how it worked would have turned any customer test into noise.

What I did

I ran an internal assumption-mapping session that surfaced more than twenty assumptions about how the system behaved, and it exposed real disagreement on fundamentals, including whether editing a shared template changed work that already existed. Rather than paper over it, I separated what the team genuinely agreed on from what it only assumed, then turned the open questions into a concrete evidence plan: competitor research plus a customer survey routed through the advisory community.

Strong agreement on three things, open disagreement on the rest. Better to find that before we test with customers, not during. Assumption mapping, validation planning

20+

Assumptions surfaced and sorted into agreed versus unverified

3

Concepts with genuine team agreement, isolated from the rest

2

Evidence streams planned to settle the open questions before testing