A — 3 / CASE STUDY
DESIGNING
FOR WHAT THE
MODEL DOES
NOT KNOW

AI changes the rules of product design. Outputs are probabilistic, not deterministic. The system can be wrong, and the user needs a way to know that without losing trust in the product entirely. This case study covers the design work behind a persistent in-product AI agent — a companion that follows enterprise users through their journey — and the decisions required to surface model outputs honestly, handle uncertainty, and collaborate with ML engineers on what the product can and cannot promise.

AGENT USER CONTEXT ACTIONS INSIGHTS AI AGENT A — 3
Role
Lead Product Designer
Discipline
AI product design, ML collaboration, trust & uncertainty, agentic UX
Context
Enterprise B2B SaaS platform
Threads in this study
03
THREAD 01 / 03
Surfacing model outputs
AI output design Confidence & uncertainty Trust calibration Progressive disclosure
MODEL OUTPUT HIGH MED LOW
Fig. — Confidence sets the disclosure
The problem

Enterprise users needed to act on data the model produced, but the model's outputs carried inherent uncertainty that a traditional dashboard would hide. Presenting a number without its confidence range would create false precision; presenting the full distribution would overwhelm. Getting this wrong in either direction had a cost — overconfidence erodes trust when the model is wrong, and excessive hedging makes the output useless.

What I did

I worked with the ML team to understand what the model actually knew vs. what it was estimating, and used that distinction to design a tiered disclosure system. High-confidence outputs surfaced cleanly. Lower-confidence outputs included a visible signal — not a statistical footnote, but a design-level indicator built into the component — with a path to understand why. I mapped the thresholds with ML engineers so the UI's confidence language matched what the model's confidence scores actually meant, not a rough approximation.

The goal was calibrated trust: users should believe the model when it's right, and have enough signal to push back when it isn't.

Hiding uncertainty doesn't make a product feel smarter. It makes it feel wrong when it's wrong and untrustworthy when users find out. Design principle, AI output surfaces
3
Confidence tiers designed into the component system, not bolted on as tooltips
0
Statistical jargon in user-facing copy — all uncertainty framed in workflow terms
1
Shared confidence threshold map between design and ML — same language on both sides
THREAD 02 / 03
Designing the agent
Persistent AI companion Agentic UX Journey-aware context Interruptibility
AGENT USER JOURNEY
Fig. — One agent across the journey
The problem

A one-off AI widget is easy. An agent that follows a user through their entire session — aware of where they came from, what they're trying to do, and what they've already tried — is a fundamentally different design problem. Most enterprise AI surfaces are stateless: you ask a question, you get an answer, context resets. This agent needed to hold context across the journey and use it to be genuinely more useful than asking again from scratch.

What I did

I mapped the user's journey through the platform and identified the moments where an aware agent would reduce meaningful friction: transitions between tools, moments of failure or confusion, and decision points where users typically left to look something up elsewhere. I designed the agent's presence to be ambient by default and interruptible on demand — it shouldn't compete for attention, but it should be immediately accessible when needed.

The key design question was what the agent should know vs. what it should ask. I worked through that with product and ML: what context the system could infer reliably, what it should confirm, and where assuming too much would feel intrusive rather than helpful. The agent's personality and communication patterns were constrained by those constraints — designed to be confident where it had signal, and honest where it didn't.

An agent that knows where you are but doesn't know what you're trying to do is still just a search box with extra steps. Design constraint, agentic context model
5+
Journey touchpoints where agent presence was explicitly designed, not assumed
2
Modes: ambient (low footprint) and engaged (full context surface)
0
Assumed context passed to user without confirmation at high-stakes moments
THREAD 03 / 03
Working with ML constraints
ML collaboration Design within model limits Explainability Graceful degradation
? MODEL OUTPUT EXPLAINABILITY GRADED EXPLANATION
Fig. — Graded by what is explainable
The problem

Enterprise product designers working on AI surfaces face a constraint that doesn't exist in most product work: the system's behavior is not fully predictable, and the reasons for its decisions are not always surfaceable. Users want to know why. The model doesn't always have a clean answer. The product has to hold that gap without making either the model or the user look bad.

What I did

I ran working sessions with ML engineers to understand exactly what the model could and couldn't explain about its own outputs. From that, I built a design vocabulary for explainability: what level of explanation was available for each output type, how to communicate that to users honestly, and how to handle the cases where no explanation could be offered.

I also designed for graceful degradation — the states where the model was unavailable, underconfident, or operating outside its training distribution. These weren't edge cases to be minimized in the UI; they were first-class design problems with their own states, copy, and recovery paths. A model failure that a user doesn't notice is a trust problem. A model failure that's handled well is a product moment.

If the model can't explain it, the product still has to say something. "I don't know" is a valid answer — but it has to be designed. Design principle, explainability systems
4
Explainability tiers mapped from ML capability to design language
3
Degraded states designed as first-class experiences: unavailable, low-signal, out-of-distribution
1
Shared ML-design working model for what the product can and cannot promise