AI-Generated Playwright Tests: What Works and What Does Not

Where AI is genuinely strong

AI is excellent at translating natural language into structured test steps. Give it a description of a user journey and it will produce a reasonable first-pass scenario faster than any engineer could write it manually. It handles the mechanical work of turning 'user logs in' into explicit click, fill, and expect steps reliably.

AI also helps with coverage expansion. Once a team has one or two scenarios working, AI can rapidly draft variations: happy paths, edge cases, error states, and alternative flows. That kind of broad first-pass coverage used to require significant manual effort.

Drafting first-pass scenarios from plain-English user stories
Expanding a rough flow description into explicit, ordered steps
Generating coverage for new features quickly as part of the development cycle
Updating scenarios after UI changes when given a description of what changed
Suggesting edge cases and error paths a human author might overlook

Where AI consistently falls down

AI-generated tests that live only in chat are disposable by definition. They cannot be rerun, reviewed, versioned, or maintained. They are useful for a single moment and then gone. This is the most common failure mode for teams that adopt AI tooling for testing: the generation is fast, but the workflow around it does not exist, so nothing durable gets built.

AI also struggles with application-specific context it does not have. It does not know that your 'Continue' button only appears after a specific validation step, or that your checkout flow has a guest-only path that behaves differently, or that your login form has an unusual two-step process. Without accurate context, it generates plausible-looking tests that do not reflect the actual product.

The fix for both problems is the same: put the AI inside a workflow that writes to durable artifacts in the repo, and give it access to enough application context to generate accurately.

The workflow that makes AI output durable

The strongest model is AI-assisted generation into a repo-native scenario file. A developer or AI coding agent requests a test, the scenario is written directly into the project, reviewed in a pull request, and then committed, executed, and maintained as part of normal development.

This approach preserves all the speed advantages of AI generation while adding the ownership and review discipline that makes a test suite trustworthy over time. The scenario file becomes a real project asset — something the team can point to, update, and rely on — rather than a one-time chat output.

How Assert fits into an AI-assisted workflow

Assert's MCP server lets AI coding agents — Claude, Cursor, Windsurf, and others — write scenarios directly into the repo, run them against the live application, and inspect the results without leaving the agent session. The agent does not generate disposable code into chat; it creates a real Assert scenario file, saves it to the project, and can rerun it later.

That distinction matters. A test written through the Assert MCP is a first-class project artifact from the moment it is created. It can be reviewed, committed, scheduled, and updated — just like anything else in the repo.

FAQ

Can AI fully automate QA?

Not reliably, and teams should be skeptical of claims that it can. AI is excellent at drafting and accelerating test creation, but it lacks the product judgment to decide what coverage is sufficient, what risk is acceptable, and what expected behavior is correct. Human QA judgment remains essential for coverage strategy, risk prioritization, and acceptance criteria.

What makes AI-generated tests useful over the long term?

They need to land in the repo as versioned, reviewable files rather than staying in a chat window. The generation is fast; the discipline of treating the output as a real project artifact is what makes the investment durable. AI tests that are committed, reviewed, and maintained like any other code age well. AI tests that exist only in session history disappear.

Why generate Markdown scenarios instead of raw Playwright code?

Because Markdown keeps the intent readable after generation. When an AI writes raw Playwright code, the output is often correct but opaque — hard for a reviewer to validate quickly and hard for a future maintainer to update confidently. When the AI writes a readable scenario, any engineer on the team can review it in seconds and update it when the product changes.

Put the workflow in your repo, not in a chat transcript

Assert is strongest when scenarios become durable project assets: readable Markdown in the repo, generated execution underneath, and result inspection in the dashboard.

Get Started Free See How It Works