AI changes the rules of product design. Outputs are probabilistic, not deterministic. The system can be wrong, and the user needs a way to know that without losing trust in the product entirely. This case study covers the design work behind a persistent in-product AI agent — a companion that follows enterprise users through their journey — and the decisions required to surface model outputs honestly, handle uncertainty, and collaborate with ML engineers on what the product can and cannot promise.
Enterprise users needed to act on data the model produced, but the model's outputs carried inherent uncertainty that a traditional dashboard would hide. Presenting a number without its confidence range would create false precision; presenting the full distribution would overwhelm. Getting this wrong in either direction had a cost — overconfidence erodes trust when the model is wrong, and excessive hedging makes the output useless.
I worked with the ML team to understand what the model actually knew vs. what it was estimating, and used that distinction to design a tiered disclosure system. High-confidence outputs surfaced cleanly. Lower-confidence outputs included a visible signal — not a statistical footnote, but a design-level indicator built into the component — with a path to understand why. I mapped the thresholds with ML engineers so the UI's confidence language matched what the model's confidence scores actually meant, not a rough approximation.
The goal was calibrated trust: users should believe the model when it's right, and have enough signal to push back when it isn't.
A one-off AI widget is easy. An agent that follows a user through their entire session — aware of where they came from, what they're trying to do, and what they've already tried — is a fundamentally different design problem. Most enterprise AI surfaces are stateless: you ask a question, you get an answer, context resets. This agent needed to hold context across the journey and use it to be genuinely more useful than asking again from scratch.
I mapped the user's journey through the platform and identified the moments where an aware agent would reduce meaningful friction: transitions between tools, moments of failure or confusion, and decision points where users typically left to look something up elsewhere. I designed the agent's presence to be ambient by default and interruptible on demand — it shouldn't compete for attention, but it should be immediately accessible when needed.
The key design question was what the agent should know vs. what it should ask. I worked through that with product and ML: what context the system could infer reliably, what it should confirm, and where assuming too much would feel intrusive rather than helpful. The agent's personality and communication patterns were constrained by those constraints — designed to be confident where it had signal, and honest where it didn't.
Enterprise product designers working on AI surfaces face a constraint that doesn't exist in most product work: the system's behavior is not fully predictable, and the reasons for its decisions are not always surfaceable. Users want to know why. The model doesn't always have a clean answer. The product has to hold that gap without making either the model or the user look bad.
I ran working sessions with ML engineers to understand exactly what the model could and couldn't explain about its own outputs. From that, I built a design vocabulary for explainability: what level of explanation was available for each output type, how to communicate that to users honestly, and how to handle the cases where no explanation could be offered.
I also designed for graceful degradation — the states where the model was unavailable, underconfident, or operating outside its training distribution. These weren't edge cases to be minimized in the UI; they were first-class design problems with their own states, copy, and recovery paths. A model failure that a user doesn't notice is a trust problem. A model failure that's handled well is a product moment.