How to Use matchType in Wayback Machine CDX Queries
When using the CDX API from the Wayback Machine, the matchType
parameter lets you control how broad or narrow your search results should be. This gives you powerful filtering options when retrieving archived pages—especially if you want more than just exact matches.
By default, the CDX server only returns results for the exact URL you specify. But with matchType
, you can expand that to include all pages under a path, all results from a host, or even everything from a domain and its subdomains.
Here’s a breakdown of how it works...
matchType=exact
This is the default behavior. If you don’t specify any matchType
, the CDX server looks for only exact matches to the URL you’ve entered.
Example:
This returns snapshots only for the exact URL archive.org/about/
—not for archive.org/about/team
or anything else under it.
matchType=prefix
Use this when you want to get all archived pages that start with a certain path.
Example:
This will return not just archive.org/about/
, but also archive.org/about/team
, archive.org/about/history
, and any other pages that begin with the same URL prefix.
It’s useful if you want to explore all content under a directory or section of a site.
matchType=host
This option retrieves all archived content from a specific host, without touching subdomains.
Example:
This will include results like archive.org/
, archive.org/about/
, and archive.org/contact/
, but not content from subdomains like blog.archive.org
.
matchType=domain
If you want to include everything from a domain and all its subdomains, use matchType=domain
.
Example:
This includes content from:
archive.org
blog.archive.org
media.archive.org
Any other subdomains
Note: matchType=domain
works only if the underlying CDX index is stored in SURT (Sort-friendly URI Reordering Transform) format, which is common for large-scale Wayback installations.
Using Wildcards as a Shortcut
You can skip the matchType
parameter entirely by using wildcard-style URLs.
If your URL ends with
/*
, it behaves likematchType=prefix
Example:url=archive.org/*
→ interpreted asmatchType=prefix
If your URL starts with
*.
(e.g.,*.archive.org/
), it behaves likematchType=domain
Example:url=*.archive.org/
→ interpreted asmatchType=domain
These shortcuts are handy if you're constructing URLs manually or working with tools that accept wildcards.
When to Use Each Option
Use Case | Suggested matchType |
---|---|
Just one page | exact |
A section or directory | prefix |
A full website without subdomains | host |
A website and all its subdomains | domain |
Understanding matchType
gives you fine-grained control when working with the Internet Wayback Machine’s raw data, especially if you're analyzing sites, recovering content, or building tools like we do at Smartial.net.
If you want to see this in action or experiment with archived data, check out our free tools that interact with the Wayback Machine directly.
Comments