How to Fix Flaky Tests: Root Causes and Lasting Solutions

The most common causes of flaky Playwright tests

Timing issues are responsible for the majority of test flakiness. The test clicks a button before the network response has populated the page, or asserts on text before an animation completes. Playwright's auto-waiting mitigates this significantly, but it cannot account for every asynchronous event in your application.

The second major cause is test interdependence. Tests that share database state, session cookies, or browser storage can interfere with each other when run in parallel. A test that passes when run alone fails because a previous test left the app in an unexpected state.

Selector brittleness is the third category. Tests written against implementation details — CSS class names, element positions, auto-generated IDs — break whenever the page is refactored even if the user experience is unchanged.

Race conditions between UI updates and test assertions
Animation or transition timing not accounted for in waits
Shared state between tests (database, cookies, localStorage)
Network latency variation in CI environments
Browser-specific rendering differences
Third-party scripts or widgets loading unpredictably

Fixing timing issues properly

The correct fix for timing issues is to wait for a meaningful application state, not to add a fixed sleep. page.waitForResponse() waits for a specific API call to complete. expect(locator).toBeVisible() with Playwright's built-in retry waits for the element to appear. These waits are specific to what the test actually needs, not an arbitrary delay.

Playwright's expect() assertions retry automatically up to the configured timeout. Use them instead of one-time checks. Replace expect(await page.textContent('.status')).toBe('Done') with await expect(page.getByText('Done')).toBeVisible() — the second form retries, the first does not.

Use await expect(locator).toBeVisible() over one-shot DOM reads
Wait for network responses with page.waitForResponse() when state depends on an API call
Use page.waitForLoadState('networkidle') only when necessary — it can be slow
Never use page.waitForTimeout() in production tests — it is a symptom, not a fix

Isolating test state

Each test should be able to run in any order and produce the same result. This means creating test-specific state (a fresh user, a clean cart) at the start of each test rather than relying on what a previous test left behind.

Playwright's beforeEach hooks and storageState API make per-test setup practical. For database state, a test data factory or API-based setup (POST to a test endpoint that seeds data) is more reliable than sharing state between tests.

When to use retries — and when not to

Playwright's retry configuration (retries: 2 in playwright.config.ts) is a reasonable safety net for genuinely intermittent infrastructure issues. It is not a substitute for fixing the underlying cause. If a test needs more than one retry to be reliable, treat it as a failing test — investigate the cause rather than increasing the retry count.

FAQ

Why does my Playwright test pass locally but fail on CI?

CI environments are typically slower, have different network conditions, and run tests in parallel more aggressively. The most common causes are timing issues (animations, API responses, page loads that take longer on CI) and resource contention (tests interfering with each other when run concurrently). Start by running tests in headed mode or with --debug to see what the test is actually seeing when it fails.

How many retries should I configure in Playwright?

One to two retries is a reasonable safety net for infrastructure flakiness. More than that is a signal you are masking a real problem. Track which tests use retries over time — they are your most expensive tests and usually the ones most worth fixing properly.

What is the difference between a flaky test and a broken test?

A broken test fails consistently because the application has a bug or the test has a logic error. A flaky test fails intermittently — same code, same environment, different outcome. Flakiness is usually a timing or state isolation problem, not a logic error. Both are worth fixing, but for different reasons.

Put the workflow in your repo, not in a chat transcript

Assert is strongest when scenarios become durable project assets: readable Markdown in the repo, generated execution underneath, and result inspection in the dashboard.

Get Started Free See How It Works