PROVING THE
NEW WAY BEATS
THE OLD ONE,
WITH EVIDENCE

A design isn't done because it's shipped. I test new work against the thing it replaces, validate the architecture of experimentation features with the people who run experiments for a living, and ground interaction decisions in how customers actually test today. This page collects four threads where evidence, not opinion, settled the design.

Role
Lead Product Designer
Discipline
Comparative testing, A/B design, preference research
Context
Enterprise B2B SaaS platform
Threads in this study
04
THREAD 01 / 04
Comparative prototype testing
Old vs new Rated head-to-head Customer sessions Quantified lift
NEW HIGH CURRENT LOW
Fig. — New tested against current
The problem

A new drag-and-drop creative builder was meant to replace a workflow where users uploaded full ad creatives as flat images and hand-edited CSS to make changes. "Better" was easy to assert and hard to prove. I needed evidence that the new approach actually beat the old one for the people who'd use it.

What I did

I ran head-to-head testing with enterprise customers, asking them to rate the new builder directly against the system they used today. The gap was decisive: the new builder scored in the high range while the current system sat near the bottom, and one media customer called it a thousand-percent improvement. I captured the qualitative wins too, the ability to test different headlines and calls to action without rebuilding an entire creative, so the rating was backed by specific reasons, not just a number.

A thousand-percent improvement on the current system. Enterprise customer, media & publishing
New builder roughly doubled the old system's usability rating head-to-head
2
Independent customer sessions converged on the same verdict
1
Clear winner sent to build with the comparative evidence behind it
THREAD 02 / 04
A/B test flow design
Experiment UI Acceptance criteria Customer-validated Eng-ready
A B CTRL VARIANTS CRITERIA
Fig. — Validated against criteria
The problem

The platform needed a built-in A/B testing flow so marketers could test creative variants. Experimentation UI is unforgiving: traffic logic, locking, and the control case all have to behave exactly right, and the people who'd use it run tests for a living and would spot anything that didn't match how experiments actually work.

What I did

I designed the flow against a full set of acceptance criteria and validated it directly with customers who run experiments daily. Sessions confirmed the core interaction decisions, inline traffic editing, lock behavior, and a split-traffic-equally action, and surfaced a subtle issue where applying a setting at the wrong level would corrupt the metrics, which I routed to engineering as a spike before it could ship. The flow was reviewed and validated live with a customer ahead of its release.

Tests one variable at a time, creative or settings, never both at once. Customer experimentation practice, validated in session
17
Acceptance criteria the flow was designed and reviewed against
3
Core interactions validated: inline traffic, lock, split equally
1
Metric-corrupting edge case caught in validation and sent to a spike
THREAD 03 / 04
Holdout group requirements
Workaround analysis Requirements validation Preference research Architecture decision
VAR A VAR B HOLDOUT (empty) TRAFFIC SPLIT
Fig. — A clean holdout slice
The problem

To measure lift, customers needed a holdout group, a slice of traffic that sees nothing. There was no real holdout feature, so customers faked one with invisible variants set to most of the traffic. That hack caused scroll-lock bugs, CSS conflicts, and support tickets. The feature had to be designed from what people were actually doing to cope.

What I did

I documented the workaround in detail and turned it into requirements for a dedicated holdout: a small default, pixel-based, rendering no creative at all, so measurement stayed clean without the side effects. I validated the surrounding architecture question, whether dismissal settings belong at the campaign or variant level, with multiple customers, and let the split in their answers drive a configurable default rather than forcing one opinion on everyone.

Settings are more precise and aligned with business goals. Customers test creatives more than they test settings. Preference research, holdout architecture
2/3
Customers preferred campaign-level settings, which set the default
1
Dedicated holdout replacing a bug-prone invisible-variant workaround
0
Rendered creative in the holdout: measurement stays clean by design
THREAD 04 / 04
Assumption mapping
Internal alignment Surfacing disagreement Evidence plan De-risking
ASSUMPTIONS
Fig. — Assumptions made visible
The problem

Before validating a design with customers, the team had to agree on what it even believed. A core system concept was being built on assumptions nobody had checked, and quiet internal disagreement about how it worked would have turned any customer test into noise.

What I did

I ran an internal assumption-mapping session that surfaced more than twenty assumptions about how the system behaved, and it exposed real disagreement on fundamentals, including whether editing a shared template changed work that already existed. Rather than paper over it, I separated what the team genuinely agreed on from what it only assumed, then turned the open questions into a concrete evidence plan: competitor research plus a customer survey routed through the advisory community.

Strong agreement on three things, open disagreement on the rest. Better to find that before we test with customers, not during. Assumption mapping, validation planning
20+
Assumptions surfaced and sorted into agreed versus unverified
3
Concepts with genuine team agreement, isolated from the rest
2
Evidence streams planned to settle the open questions before testing