← Back to blog

Reverse Engineering a Frontend via Public Source Maps

·7 min read

I was building a production crawler for a large e-commerce retailer - one that needed to serve reliable product data across hundreds of thousands of SKUs. The site had no public API documentation. The endpoints I needed were buried somewhere in a React frontend served by Next.js, and I had to figure out the entire request surface before I could design a crawler architecture that wouldn't break every time they deployed.

My usual approach is to capture a full HAR file by browsing the site, then analyze the network traffic to catalog endpoints. That works well for most sites. But this platform had dozens of internal services, feature-flagged routes, and API clients generated from OpenAPI specs. The HAR only showed me what the pages I visited actually called - I needed the complete picture to find clean, structured API endpoints instead of relying on HTML parsing that could break with any frontend change. So I went after the source maps.

Why source maps exist and why they're a liability

Source maps are JSON files that map minified production JavaScript back to the original source code. Browsers use them to show readable stack traces in DevTools. The relevant fields are sources (original file paths) and sourcesContent (the full, un-minified source - complete with type annotations, comments, and import paths).

Most teams generate them. A surprising number ship them to production, usually so that error monitoring tools like Sentry can produce readable stack traces without a separate upload step. That's a valid operational tradeoff - but it means the complete source code is publicly accessible to anyone who knows where to look. In March 2026, Anthropic accidentally published source maps in their Claude Code npm package, exposing 512,000 lines of TypeScript including unreleased features and internal architecture. A missing .npmignore entry was all it took.

The architectural decision: complete enumeration vs. sampling

Next.js serves JavaScript from /_next/static/chunks/. When source maps are enabled, each chunk has a corresponding .map file at the same URL with .map appended. The naive approach - crawl pages, regex out chunk URLs from HTML - found under a hundred chunks on this site. That covers what's referenced on pages you actually visit.

But I needed everything: lazy-loaded routes, admin panels, feature-flagged code paths, dynamically imported modules. The reason is architectural - if you're building a crawler that needs to survive deployments, you need to understand the full API surface, not just the endpoints visible from the homepage. Otherwise you're designing against an incomplete contract.

Exploiting the webpack runtime for complete coverage

Webpack embeds a chunk registry in one of the initial bundles. It has to - the runtime needs to resolve any chunk on demand, so it contains a mapping of every chunk ID to its content hash:

// Inside webpack-runtime-xyz.js (minified)
{
  2934: "a1b2c3d4",
  2935: "e5f6g7h8",
  3801: "i9j0k1l2",
  // ... hundreds more
}

Parsing this registry produced several hundred chunk IDs - roughly 4x what page crawling found. That delta isn't noise. It's code-split bundles, conditional imports, and routes that only load for specific user states. For my purpose, those hidden chunks were the most valuable because they contained the internal API clients and service integrations that the public-facing pages never expose.

What the source maps revealed about the platform

The extraction produced thousands of original source files. But what mattered wasn't the volume - it was what I could learn about their architecture:

  • Dozens of API endpoints from OpenAPI-generated clients - fully typed, with request parameters, response shapes, and authentication requirements. No reverse engineering needed.
  • Several internal packages under a scoped namespace - their component library, analytics wrapper, auth utilities, and a shared types package. This told me how they structure their frontend services.
  • API versioning patterns - I could see which endpoints were on v1 vs v2, which were deprecated, and which had experimental flags.
  • Authentication flow - the auth utility package revealed token refresh logic, session management, and which endpoints required which auth levels.
// Example: what an extracted OpenAPI-generated client looks like
export class CatalogService {
  public static getItem(
    id: string,
    options?: { locale?: string; includeMetadata?: boolean }
  ): CancelablePromise<ItemResponse> {
    return __request(OpenAPI, {
      method: "GET",
      url: "/api/v2/catalog/{id}",
      path: { id },
      query: {
        locale: options?.locale,
        include_metadata: options?.includeMetadata,
      },
    });
  }
}

This is what informed the crawler architecture. Instead of reverse-engineering endpoints one by one through HAR analysis or HTML parsing, I had the full typed contract. I could design request patterns, pagination logic, and error handling against documented interfaces rather than guessing from network traffic.

Operational use: deployment detection for crawler resilience

The second architectural decision was using source maps for ongoing monitoring. Chunks are content-hashed - when source changes, hashes change. Running the extraction periodically and diffing against the previous run gives you a deployment changelog:

$ srcmap diff --previous ./runs/2026-05-01
12 chunks changed since last run
  src/services/api/CatalogService.ts     +15 -0   (new endpoint added)
  src/components/Checkout/Payment.tsx    +42 -17
  src/hooks/useFeatureFlag.ts            +8  -3

When you own a production crawler processing hundreds of thousands of products, API changes are the primary failure mode. An endpoint gets renamed, a required parameter gets added, a response shape changes - and your crawler silently returns bad data or stops entirely. I've debugged incidents where a retailer changed their API endpoints while simultaneously adding bot protection, and it took weeks to notice the data had gone stale.

Source map diffing gives you a leading indicator, not a guarantee. But more often than not, you see the API client change in their source before the old endpoint gets deprecated. That's the difference between proactive adaptation and incident response.

The tradeoffs I chose not to make

I could have stuck with my usual approach - HAR capture, network analysis, endpoint cataloging through traffic observation. That approach has advantages: it captures runtime behavior, shows actual request patterns, and works on any site regardless of framework.

I chose static analysis because the crawler needed to be resilient to change, not just functional today. HAR capture and browser automation show what the site does now. Source map analysis captures what the site can do, including unreleased features, experimental endpoints, and internal tooling that haven't been exposed to users yet. For a crawler that needs to survive long-term platform evolution, that forward visibility is worth more than a snapshot of current behavior.

On the ethics of reading public files

Source maps served over public URLs without authentication are publicly accessible. Browsers download them automatically when you open DevTools. There is no access control to bypass and no credentials required. Disabling them in Next.js is a single config line:

// next.config.js
module.exports = {
  productionBrowserSourceMaps: false,
};

The takeaway for teams: if you ship source maps to production, audit what's in them. Generated API clients, internal package source, environment-specific config, and infrastructure URLs all end up in the bundle. That's fine if you've made that decision intentionally. Most teams haven't.