Case Study · AI Product Design

Zoom AI Agent

The AI that doesn't just sit in your meetings. It helps you actually finish them.

Role

Senior Product Designer

Timeline

12 weeks

Tools

Figma, FigJam, Dovetail

Platform

Desktop · Web

Type

0→1 Feature

It's 5:17 pm. Your product review just ended. Six decisions made. Twelve things someone said they'd do. You left the call feeling like things finally moved.

By Wednesday afternoon, your PM pings: "Wait, did we decide on March or April?" You check your notes — a wall of half-sentences and timestamps. You're not sure.

By Thursday, the engineer who said he'd "look into it" hasn't. Not because he forgot exactly. He thought you owned it. You thought he did.

By Friday, there's a calendar invite in your inbox: Meeting follow-up.

You just scheduled a meeting about a meeting.

The broken pattern

This isn't a you problem. It's a systems problem.

Every team has a version of this story. The meeting felt productive. The week after it didn't.

14.8 hrs

the average professional spends in meetings every week — that's 37% of a 40-hour workweek.

via Asana State of Work 2024

44%

of action items from meetings never get completed. Nearly half of everything decided just stops.

via meetingtoll.com

70%

of decisions made in meetings are forgotten within 24 hours. Not misremembered. Gone.

via fellow.ai research

3–4×

the number of times teams reschedule the same topic before it finally gets resolved.

via meeting analytics research

The tools that exist today — Otter, Fireflies, Copilot, Grain — are all solving a different problem. They record. They transcribe. They summarize. They are very good at telling you what happened in a meeting.

None of them help you do anything about it.

What we were actually building

Not better notes. Execution infrastructure.

Zoom AI Companion already had excellent transcription and summarization. That wasn't the gap. The gap was everything that should happen between a decision being made and a decision being acted on.

Not this

Better transcription

Smarter summaries

A meeting chatbot

This instead

Captures decisions as they happen

Assigns ownership with context

Follows up so you don't have to

The meeting is not the product. What happens after the meeting is the product. We were building the connective tissue between a decision and its outcome.

Who I was designing for

The team lead

PM · EM · Squad lead

Runs 8–12 meetings a week. Accountable for outcomes. Spends Monday morning piecing together what happened last Thursday.

The IC participant

Engineer · Designer

Gets assigned work verbally. Finds out Wednesday what they were supposed to start Monday. Ownership is always ambiguous.

The manager

Director · VP

Needs visibility into whether anything is actually moving. Finds out about blockers when it's already too late to unblock them.

The solution, across four moments

A meeting has a before, a during, and an after. Most tools only cover the during.

The agent lives across all four moments. Each one hands off to the next. Nothing falls through the gap.

BEFOREPre-Meeting Brief

➡️How does a team walk into a meeting knowing exactly what's unfinished from last time?

Fifteen minutes before the standup, the agent sends a brief. Not a summary — a status report. Three things from last week that still aren't done. Two decisions that were made but never actioned. One recurring blocker that has now come up in four meetings.

You walk into the meeting already knowing where the bodies are buried.

🎯The meeting starts with shared context, not 10 minutes of "wait, where did we land on that?"

The brief is personalized to your role. The team lead sees ownership gaps. The IC sees only their open items. The manager sees team-level blockers. Same data, different lenses.

You stopped walking into meetings blind. The agent told you what was broken before anyone had to say it out loud. 📋

LIVEIn-Meeting Co-pilot

➡️How does the AI capture what's decided without interrupting the conversation?

The agent runs silently in a sidebar. As the conversation moves, it detects decisions and action items in real time — not after the call, during it. You can confirm, correct, or reassign with a single tap while the meeting is still live.

Nothing goes into the system without you seeing it first. The agent shows its work inline, so you know exactly why it flagged something.

🎯By the time the meeting ends, the action items are already structured, owned, and ready. Not a wall of text — a list of commitments.

The agent is intentionally light during the meeting. No popups. No audio. It surfaces things in a way that doesn't demand your attention — but is ready when you look.

When someone says "I'll take that" — it assigns it. When someone says "let's decide by EOD" — it sets the deadline. When two people talk over each other about ownership — it flags the conflict for you to resolve.

The engineer didn't take notes. He didn't need to. His two action items were already in his inbox when the call ended. ⚡

AFTERExecution Handoff

➡️How do you turn a 45-minute meeting into a clear plan in under 2 minutes?

Within minutes of the call ending, the agent generates a structured handoff. Not a transcript. Not a summary paragraph. A proper execution document: decisions made, open questions, owners, deadlines, and a draft follow-up message ready to send.

You review it, make any edits, and approve. The agent sends it. Everyone who was in the meeting — and everyone who wasn't — gets the same version of what happened.

🎯The "can you send meeting notes?" message stops existing. The handoff is already in everyone's inbox.

The handoff is also where the agent nudges. If a deadline is set for Friday and it's now Thursday morning, it sends a quiet check-in. Not spam — a single, well-timed prompt.

The PM didn't write the follow-up email. The agent drafted it. She changed two words and hit send. It took 40 seconds. 🤝

ALWAYSApproval & Trust Controls

➡️How do you give the AI room to act without losing control of what it does?

The agent never sends, assigns, or acts on anything without your approval first. Every proposed action shows up as a card: what the AI wants to do, why it thinks that, and three options — approve, edit, or dismiss.

Over time, if you keep approving certain types of actions without editing them, the agent learns that pattern and can do those things automatically. Trust is earned, not assumed.

🎯You control how much the agent does on its own. It starts conservative. It gets more capable as you get more comfortable.

The trust controls live in a settings panel. You can set autonomy by action type: "send follow-ups automatically" but "always ask before assigning to someone else." Granular, predictable, yours.

Every automated action shows up in a log. Not buried in settings — visible in the main interface, always. You can undo anything within 2 hours.

She didn't worry about the agent doing something weird. She could see exactly what it had done — and undo any of it. That's what made her comfortable letting it do more. 🔒

MEMORYCross-Meeting Memory

➡️How do you stop relitigating things that were decided three meetings ago?

The agent keeps a running memory of your team's decisions, open questions, and recurring themes — across every meeting, going back months. Ask it anything: "What did we decide about the API versioning approach?" It tells you, with context, with the date, with who was in the room.

The memory view surfaces patterns you wouldn't see otherwise: the topic that keeps getting raised and never resolved, the person whose items are always overdue, the decision revisited four times.

🎯The team stopped asking "didn't we discuss this before?" They already knew the answer before anyone opened their mouth.

This is the compound interest feature. Each meeting adds to the memory. The more you use it, the more useful it becomes. It turns a series of disconnected conversations into something that actually learns.

The new engineer asked what the team had decided about auth. The agent answered in 8 seconds. Nobody had to find the right Slack thread. 🧠

The hardest design problem

People don't want an AI that acts. They want an AI that helps them act.

This distinction drove more decisions in this project than anything else.

Every tool I analyzed had the same failure mode: it would do things on your behalf and then tell you about it. Users hated it. Not because the actions were wrong. Because they didn't feel in control. Autonomy without visibility breeds anxiety, not trust.

Show your work

Every AI action includes a one-line explanation: why it flagged this, why it assigned this person, why it set this deadline. Not buried in a tooltip — right there, visible. Users could agree, disagree, or correct it.

Ask before acting

The agent proposes; the human decides. Every single time, until the user actively changes that setting. Trust is built by seeing the agent make good suggestions repeatedly — not by it acting before you're ready.

Earn autonomy over time

The more consistently you approve a type of action without editing it, the more the agent infers you trust it for that specific thing. Autonomy is task-specific, not global.

Undo anything

Every automated action is reversible for 2 hours. When you know you can undo it, you're much more willing to let it try. The undo log is visible, prominent, and always one tap away.

The Autonomy Dial

Instead of a binary AI on/off toggle, the system has a spectrum. Level 1: suggest only. Level 2: draft for review. Level 3: act on low-stakes items. Level 4: fully autonomous within defined limits. Users start at Level 1. Moving up requires deliberate action, not just time.

The design process, start to finish

Here's what I actually did for 12 weeks.

None of this was obvious. Several things I was sure about were wrong.

Assumptions

8 Interviews

7 Competitors

Journey Maps

Scoping

Starting with what I assumed — and what turned out to be wrong

Before any interviews, I listed my assumptions: people forget action items, ownership is unclear, post-meeting follow-up is manual and slow. All of those were true. But two things surprised me.

First: people don't hate meetings as much as the internet says they do. They hate unproductive meetings — specifically, meetings that don't lead to anything. The meeting isn't the villain. The silence after it is.

Second: the fear around AI wasn't about privacy or accuracy. It was about accountability. "If the AI assigns something to someone, who's responsible if it gets it wrong?" That became the central design question for the trust layer.

What 8 people taught me

I interviewed 8 people across roles: 3 PMs, 2 engineering managers, 2 ICs, and 1 director. All active Zoom users. All in at least 6 meetings a week.

"I spend Sunday evening looking at my calendar trying to remember what I agreed to in the meetings from last Thursday."

— PM, mid-size tech company

"The worst part isn't the meeting. It's when someone messages me a week later saying 'did you do that thing?' and I have no idea what thing they mean."

— Senior Engineer

"I've tried every note-taking tool. The problem isn't the notes. The problem is that nobody does anything with them."

— Engineering Manager

What 7 competitors all got wrong

I audited Otter.ai, Fireflies, Microsoft Copilot, Grain, Loom, Notion AI, and Fellow across five dimensions: capture quality, action extraction, follow-up support, cross-meeting memory, and trust controls.

Solved

Transcription & capture

All 7 tools do this well enough

Solved

Meeting summaries

Commoditized, not a differentiator

Partial

Action item extraction

Basic only, no ownership logic

Missing

Post-meeting execution flow

Zero tools cover this

Missing

Cross-meeting memory

Every meeting treated as isolated

Partial

Trust & approval controls

Binary on/off only, no spectrum

The strategic gap:every tool stops at the meeting. The actual value — moving from decision to execution — is completely unaddressed. That's where Zoom could win.

Mapping the full journey

I mapped the experience for both a meeting host and a participant across three stages: before, during, and after. The pattern was stark. Almost all the pain lived in the "after."

The "before" was about walking in blind. The "during" was mostly fine, except for unclear ownership. The "after" was where everything broke: slow follow-up, forgotten tasks, no one tracking anything.

What I built, what I cut

One filter for v1: does this directly shorten the gap between decision and action?

Built (v1)

Pre-meeting brief, live co-pilot, execution handoff, approval flow, cross-meeting memory, autonomy dial

Next (v2)

Jira/Linear/Slack integrations, manager team-view dashboard, trust calibration learning, meeting health scoring

Cut

Real-time AI voice participation (needs its own trust framework), AI scheduling (scope creep), public meeting rooms (different use case)

Key decisions

Four calls that defined the design.

Each one had a wrong version first.

Real-time capture vs. post-meeting processing

Post-meeting summary→Live sidebar capture

My first version processed everything after the call ended. Cleaner, simpler, no risk of disrupting the meeting. But user testing killed it. People came out of meetings with strong, fresh context. By the time the summary arrived 5 minutes later, they'd already mentally moved on. The moment had passed.

Real-time capture lets you correct the AI while the context is still live. "No, that's not a decision — that's still open." That correction improves every output downstream.

Trade-off accepted: The live sidebar adds a small cognitive load during meetings. We made it opt-in, kept it minimal — one-tap interactions only, no typing required during the call.

Full autonomy vs. human-in-the-loop

Fully autonomous agent→Approval-first with earned autonomy

The fully autonomous version was technically impressive and emotionally wrong. In testing, users consistently said: "I don't mind if it does things — I just want to know it did them." That's not a request for autonomy. That's a request for transparency.

The approval flow isn't a limitation. It's the product. Users who spend two weeks approving the agent's suggestions end up more comfortable giving it autonomy than users who never had to engage with it at all.

One unified agent vs. specialized agents for each stage

Three separate agents→One agent, three modes

We debated whether to build separate agents for pre-meeting, live, and post-meeting — each optimized for its context. Technically cleaner. The problem: it felt like three different products. Three different UIs. Three different mental models.

One agent, three modes. The same entity in your pre-meeting brief sits in your sidebar during the call and generates your handoff afterwards. Continuity creates trust.

Trade-off accepted: A unified agent is harder to build. The engineering team pushed back. We held the line because the UX cost of a fragmented experience was higher than the technical cost of integration.

Active interruptions vs. ambient awareness during meetings

Interrupt-on-detect→Silent capture with on-demand review

Early versions had the agent interrupt the meeting to flag things: a pop-up, a sound, something that demanded attention. Every single test participant said some version of "I wish it wouldn't do that." The meeting is the primary task. The agent is secondary.

The final model: the agent captures silently. You glance at it when you want. It never demands your attention mid-sentence. After the call, everything it flagged is waiting for you — organized, contextual, ready to act on.

Target metrics that tell the real story

If this ships, here's what we'd measure.

The north star was simple: does the agent reduce the gap between decision and action?

These are design-time targets — what success looks like if this ships, not measured outcomes.

North Star Metric

Action item completion rate per meeting

If decisions made in meetings are getting acted on and tracked — everything else follows. This is the one number that proves the agent is doing its job.

Action item completion rate

The core problem, solved

~44%→>80%

Time-to-first-action after meeting

While context is still fresh

24+ hrs→<2 hrs

Meeting recurrence for same topic

Fewer follow-up-to-follow-up calls

3–4× avg→−40%

AI approval rate (trust proxy)

If users approve without editing, it's earning its place

unknown→>70%

Target Business Outcomes

>80%

Action item completion

Up from the ~44% baseline.

~44% baseline

Target

<2 hrs

Time to first action

Most actions start same day.

24+ hrs (next day)

Target

−40%

Repeat meetings, same topic

Fewer follow-up calls.

Current recurrence

Target

What's next

The foundation is laid. Here's what it enables.

Deep tool integrations

Right now, the agent lives inside Zoom. The real power comes when it pushes action items to Jira, creates Linear tickets, sends Slack messages, and updates Notion docs. The execution handoff becomes truly automated — not just a draft you copy and paste.

Manager team-view dashboard

The IC sees their items. The team lead sees their team. But managers are still flying blind. A team-level view showing action item completion rates, recurring blockers, and meeting health across their entire org is the next big surface — and where the B2B business case lives.

Trust calibration that evolves

The autonomy dial is set manually today. The next version learns. If you keep approving a certain type of suggestion without editing it for 3 weeks, the agent suggests moving up an autonomy level — with your data as the evidence. Trust earned, measured, and surfaced.

Honest reflection

What I'd do differently if I started today.

The hardest part of this project wasn't designing the AI features. It was designing for a world where users are still figuring out what they want from AI. Expectations are all over the place — some people want it to do everything, some want it to do nothing. Designing for that range without picking a lane is genuinely difficult.

I underestimated how much the trust problem would dominate the process. I thought I'd spend most of my time on information architecture — what to capture, how to display it. Instead, I spent the most time on the approval flow and the autonomy model. The features were the easy part. The relationship between the user and the agent was the hard part.

If I started over, I'd prototype the approval flow first — before any of the screens, before any of the capture logic. The trust layer is the foundation everything else sits on. Getting it wrong makes everything else feel wrong too.

I'd also interview more ICs earlier. I over-indexed on team leads because they were easier to reach. But the people whose experience changes most with this tool are the participants — the people who walk out of meetings not totally sure what they just agreed to. Designing for them earlier would have made the live co-pilot much better from the start.

Meetings aren't the problem. The silence after them is.

SideDoor was about fixing a broken channel. This was about fixing a broken loop — the one between deciding something and actually doing it. Zoom already owns the meeting. This was about making what happens after it just as good.

If you made it this far, thank you. Always happy to talk about this one. ❤️