Flaky tests: root causes & fixes — Real‑World Case Study — Practical Guide (Nov 30, 2025)

Flaky tests: root causes & fixes — Real‑World Case Study

Level: Intermediate

30 November 2025

Introduction

Flaky tests — tests that produce inconsistent outcomes without changes to the code — remain a major productivity killer in software engineering. They erode trust in the test suite and slow delivery by requiring repeated reruns and investigations.

In this article, we explore common root causes of flaky tests based on a real-world company case study using JUnit 5 (v5.9.x) and Playwright (v1.40+) in a mixed Java and TypeScript microservices environment as of late 2025. We provide actionable fixes, practical advice for validation, and a concise checklist for prevention.

Prerequisites

Basic knowledge of unit and integration testing
Familiarity with JUnit 5 for Java or Playwright Test Runner for TypeScript
Access to your test source code and CI/CD pipeline logs
Understanding of asynchronous programming concepts and concurrency

Hands-on: Identifying and Fixing Flakiness

1. Analyse flaky test reports and logs

Start by gathering failure artefacts — console logs, stack traces, timestamps, and environment data. Categorise flakes into groups such as:

Timing related (timeouts, race conditions)
Resource conflicts (shared state, file locks)
External dependencies (network, databases)
Test infrastructure (CI machine load, ephemeral failures)

This step informs targeted fixes rather than guesswork.

2. Case Study: Timing and Race Conditions in Java JUnit Tests

The team noticed intermittent NullPointerException errors spreading across unrelated tests, usually in integration tests using a shared in-memory database.

Root cause: tests did not properly await completion of asynchronous database population, causing subsequent tests to run against half-initialised fixtures.

Fix: Use JUnit 5’s @BeforeEach methods combined with CompletableFuture-based synchronisation to ensure fixtures are fully ready:

@BeforeEach
void setupFixture() throws Exception {
    CompletableFuture<Void> populateDbFuture = CompletableFuture.runAsync(() -> {
        myFixture.populateDatabase();
    });
    populateDbFuture.get(5, TimeUnit.SECONDS); // waits for fixture ready or throws
}

This prevents tests from executing before asynchronous preparation completes, eliminating the race condition.

3. Case Study: Shared State Issues in Playwright Tests

Playwright E2E tests ran in parallel on GitHub Actions runners, occasionally failing with errors related to global configuration collisions and inconsistent browser contexts.

Root cause: tests shared a global config file and session cookies without proper isolation, creating flaky state leakage.

Fix: Implement strict test session isolation using Playwright’s test.use() API:

import { test as base } from '@playwright/test';

const test = base.extend({
  storageState: async ({ browser }, use) => {
    // Create fresh storage state for each test to avoid cookie/session collisions
    const context = await browser.newContext();
    await context.storageState({ path: 'temp-storage.json' });
    await use('temp-storage.json');
    await context.close();
  }
});

test('example test', async ({ page }) => {
  await page.goto('https://myapp.example.com');
  // test logic
});

Using isolated contexts per test run eliminates side effects.

4. External Dependency Flakes

Tests depending on external network services like REST APIs or databases regularly failed due to transient network delays or service throttling.

Fixes include:

Implementing local mocks or test doubles to decouple from unstable external systems.
Adding retries with exponential backoff for network calls during test setup.
Ensuring idempotent test data and clean teardown procedures to avoid inconsistent states.

Example retry snippet for Playwright test setup:

async function fetchWithRetry(url: string, retries = 3): Promise<Response> {
  for (let i = 0; i < retries; i++) {
    try {
      return await fetch(url);
    } catch {
      if (i === retries - 1) throw;
      await new Promise(res => setTimeout(res, 1000 * 2 ** i));
    }
  }
  throw new Error('Unreachable');
}

Common Pitfalls

Lack of isolation: global state or shared fixtures not reset between tests.
Ignoring async operations: not awaiting promises or async calls inside setup or tests.
Over-mocking: leading to tests that pass but don’t reflect real behaviour.
Ignoring CI environment differences: tests must be resilient to CI-specific resource constraints.
Test order dependencies: relying on tests running in a specific sequence.

Validation

After applying fixes:

Rerun the flaky suite multiple times locally and on CI to detect remaining nondeterminism.
Use JUnit 5’s @RepeatedTest annotation for repeated runs in CI:

@RepeatedTest(10)
void flakyTestRepeat() {
    // test code
}

For Playwright, increase parallelism and rerun test suites to expose residual flakes:

npx playwright test --repeat-each=5 --workers=4

Leverage test flakiness detection tools integrated into CI (e.g., GitHub Actions flakes dashboard, Jenkins flaky test reporter).

Checklist / TL;DR

✔ Identify flake type: timing, shared state, external dependency, CI resource
✔ Remove shared mutable state or reset it fully between tests
✔ Await all async operations to ensure test setup readiness
✔ Use test isolation APIs: isolated JUnit lifecycle, Playwright test.use()
✔ Mock or stub unstable external services where feasible
✔ Add explicit retries with backoff for flaky network calls
✔ Use repeated runs with @RepeatedTest (JUnit) or --repeat-each (Playwright)
✔ Avoid test order dependencies; use @TestMethodOrder only if necessary and explicit
✔ Monitor flaky tests on CI to avoid ignoring regressions