How to Write a Usability Test Script for B2B SaaS

The first task on most B2B SaaS scripts reads “Explore the dashboard and tell us what you think.” Sessions that open that way return rambling commentary and almost no usable signal. The participant talks for 45 minutes about colors, and nobody can name a thing to fix on Monday. The tool is fine. The usability test script is the part that decided what came back.

According to the Maze Future of User Research 2026 report, 39% of user research is now run by product managers rather than dedicated researchers, and fewer than half of those organizations train the non-researchers running it. Scripts written by people learning on the job default to the dashboard-explore pattern, because it sounds open-ended and safe.

The script has three parts that decide whether a session produces fixes or footage: the task scenario, the follow-up prompts, and the observer notes format.

What a usability test script contains

The task scenario tells the participant what they need to accomplish, without telling them where to click. The follow-up prompts give the facilitator (or the recording prompt list) questions that turn behavior into evidence. The observer notes capture what happened in a structure someone who was not in the room can read.

For B2B SaaS specifically, a fourth piece helps: a short role-and-context setup. Enterprise users arrive with mental models from Salesforce, Jira, and a dozen internal tools. Two sentences (“You are a customer success manager. You joined this platform three weeks ago.”) cuts down on sessions where participants compare your product to the tool they use at work. For the persona work that makes this realistic, see how to write a usability testing persona for B2B SaaS.

Tasks that describe a goal, not a route

A task scenario should describe what the user is trying to do, not how to do it. The distinction is the difference between testing the product and testing whether the participant can follow directions.

Compare these two for the same flow.

Route-based: “Click the Reports tab and export the Q1 revenue data as a CSV.”

Goal-based: “You need to share last quarter’s revenue performance with your finance team. Show us how you’d do that.”

The goal-based version reveals whether the user can find the Reports tab, understand what they are looking at, and pick the right export format. The route-based version reveals whether they can read.

For B2B SaaS, frame tasks around business outcomes rather than UI actions. Real users are onboarding a new customer, prepping a slide for Thursday’s QBR, or configuring a workflow before a teammate’s first day. If the task reads like a product spec, the participant follows the spec.

A working sequence for writing one task:

Write the business goal in one sentence.
Add the role context and the stakes (“you need this before Thursday’s board call”).
Strip every word that names a specific tab, button, or feature.

Keep it to 40 to 60 words. Longer tasks push too much into working memory, which changes how the participant navigates. Shorter tasks underspecify the situation.

Follow-up prompts that surface why

Follow-ups are what turn a click into evidence. Without them, you see what the user did. You do not see what they expected.

The most common mistake is asking leading questions. “Was the navigation clear?” signals that you expect clarity. “What were you looking for at that step?” does not.

When to ask	Prompt	What it reveals
After task completion	”What were you expecting to find before you clicked that?”	Mental model gaps
After a hesitation	”What would you need to see here to feel confident continuing?”	Missing trust signals or information
After a task failure	”What would you normally do at this point in your workflow?”	Workarounds and alternative expectations
End of session	”If this were part of your real workflow, what would have made you stop?”	Stakes and friction without leading

For B2B SaaS, the most useful follow-up is usually about reference frame. Asking “How does this compare to what you expected?” after each task captures what the participant brought in from other tools, without naming a competitor and anchoring the answer.

Observer notes you can read in five minutes

Observer notes have one job: letting reviewers find what matters without watching full recordings. Anything that does not serve that job is overhead.

Four columns cover what is needed for B2B SaaS sessions:

Timestamp	Behavior	Expectation gap	Severity
When it happened	What the user did	What they expected vs. what was there	1, 2, or 3

The expectation gap column is the part most enterprise scripts skip. Users arrive with assumptions from years of other products. When yours works differently, that is sometimes a usability failure and sometimes a communication failure, and good notes distinguish “the user could not find it” from “the user did not believe the system would behave the way they expected.”

A three-point severity scale is enough. 1 for observation only, 2 for user recovered, 3 for task failure or explicit frustration. Five-point scales need calibration the team will not do, and produce ratings that disagree across reviewers.

Five mistakes that produce data nobody can use

1. Writing tasks that lead to the answer. “Use the filter to narrow the results” tells the participant where to look. Write the goal instead.

2. Using internal product names. Your team calls it “Configuration Hub.” Users would type “settings.” Using your name in the task tests recall of your name, not findability.

3. Stuffing too many flows into one session. A 45-minute session supports two or three tasks well. More produces fatigue and rushed clicks. One reviewer of UserTesting on Capterra noted in July 2025 that “some of the responses can feel hurried or low-effort.” Overstuffed scripts produce that result regardless of which platform runs them.

4. Skipping the role context. Without it, B2B participants navigate as generic users rather than as the role your product is built for, and the findings reflect generic-user friction, not your user’s friction.

5. Treating the script as fixed after session one. After the first run, tasks that confused the participant before they started need rewording. Tasks that produced identical behavior across all participants were probably too prescriptive.

Where the script work actually lives

The task scenarios, prompts, and notes structure above are reusable. The part that is not reusable is the writing time itself. Teams that test on a sprint cadence write a fresh script every two weeks, and the blank-page tax is the reason most teams do not.

Tessary’s AI-Assisted Setup turns a one-line description of the flow and user type into goal-based tasks and follow-up prompts you can review and adjust. Personas then run the script against your Figma prototype or live URL in a real browser, and findings come back as structured issues with screenshots and step traces. That compresses script-to-findings into the same afternoon.

Try Tessary on your next usability test