How to Use matchType in Wayback Machine CDX Queries

When using the CDX API from the Wayback Machine, the matchType parameter lets you control how broad or narrow your search results should be. This gives you powerful filtering options when retrieving archived pages—especially if you want more than just exact matches.

By default, the CDX server only returns results for the exact URL you specify. But with matchType, you can expand that to include all pages under a path, all results from a host, or even everything from a domain and its subdomains.

Here’s a breakdown of how it works...

matchType=exact

This is the default behavior. If you don’t specify any matchType, the CDX server looks for only exact matches to the URL you’ve entered.

Example:

 
http://web.archive.org/cdx/search/cdx?url=archive.org/about/&matchType=exact

This returns snapshots only for the exact URL archive.org/about/—not for archive.org/about/team or anything else under it.

matchType=prefix

Use this when you want to get all archived pages that start with a certain path.

Example:

 
http://web.archive.org/cdx/search/cdx?url=archive.org/about/&matchType=prefix

This will return not just archive.org/about/, but also archive.org/about/team, archive.org/about/history, and any other pages that begin with the same URL prefix.

It’s useful if you want to explore all content under a directory or section of a site.

matchType=host

This option retrieves all archived content from a specific host, without touching subdomains.

Example:

 
http://web.archive.org/cdx/search/cdx?url=archive.org/about/&matchType=host

This will include results like archive.org/, archive.org/about/, and archive.org/contact/, but not content from subdomains like blog.archive.org.

matchType=domain

If you want to include everything from a domain and all its subdomains, use matchType=domain.

Example:

 
http://web.archive.org/cdx/search/cdx?url=archive.org/about/&matchType=domain

This includes content from:

  • archive.org

  • blog.archive.org

  • media.archive.org

  • Any other subdomains

Note: matchType=domain works only if the underlying CDX index is stored in SURT (Sort-friendly URI Reordering Transform) format, which is common for large-scale Wayback installations.

Using Wildcards as a Shortcut

You can skip the matchType parameter entirely by using wildcard-style URLs.

  • If your URL ends with /*, it behaves like matchType=prefix
    Example:
    url=archive.org/* → interpreted as matchType=prefix

  • If your URL starts with *. (e.g., *.archive.org/), it behaves like matchType=domain
    Example:
    url=*.archive.org/ → interpreted as matchType=domain

These shortcuts are handy if you're constructing URLs manually or working with tools that accept wildcards.

When to Use Each Option

Use CaseSuggested matchType
Just one pageexact
A section or directoryprefix
A full website without subdomainshost
A website and all its subdomainsdomain

Understanding matchType gives you fine-grained control when working with the Internet Wayback Machine’s raw data, especially if you're analyzing sites, recovering content, or building tools like we do at Smartial.net.

If you want to see this in action or experiment with archived data, check out our free tools that interact with the Wayback Machine directly.

Comments