How to Control Result Limits in Wayback Machine CDX Queries

When you query the Wayback Machine using the CDX API, it can return thousands - or even millions - of archived snapshots, depending on the URL. To keep your queries manageable and avoid overloading your tools or browser, it’s essential to know how to limit the results.

The CDX API provides flexible options for controlling both the amount and direction of the results returned.

Limiting the Number of Results

The most common way to restrict the size of a response is by using the limit= parameter.

Basic usage:

 
limit=100

This tells the server to return only the first 100 results, starting from the earliest available snapshot. It’s a good default when you’re testing or sampling.

Example:

 
http://web.archive.org/cdx/search/cdx?url=example.com&limit=100

This fetches the first 100 records for example.com.

Getting the Most Recent Results

If you're only interested in the most recent captures (instead of the oldest ones), you can request results from the end of the list by using a negative limit.

 
limit=-5

This returns the last 5 snapshots, typically the most recent captures available.

Example:

 
http://web.archive.org/cdx/search/cdx?url=example.com&limit=-5

Keep in mind that returning results from the end can take longer to process, because the server has to scan through everything and keep only the last few entries.

Improving Speed with fastLatest

To speed up queries for the latest snapshots, you can add fastLatest=true to the query. This is especially useful when you're using limit=-1 or a small negative limit.

This setting activates a shortcut on the server that quickly locates the most recent capture for exact-match URLs.

Example:

 
http://web.archive.org/cdx/search/cdx?url=example.com&fastLatest=true&limit=-1

This returns at least one of the most recent records, and may return more depending on how the index is stored on the backend.

Use this when you’re mainly interested in checking the current state of a page in the archive.

Skipping Results with offset

The offset= parameter lets you skip a number of records at the beginning of the result set. It works together with limit= to let you scroll through large sets of results.

Example:

 
http://web.archive.org/cdx/search/cdx?url=example.com&limit=100&offset=200

This returns records 201 through 300, assuming enough records exist.

While useful for pagination or step-by-step analysis, keep in mind that offset-based queries may become slower as the offset increases. The CDX server has to scan and discard results up to that point each time the query is run.

Server-side Maximums (or maxima?)

Even if you don't specify a limit, the server has a hard ceiling on how many records it will return. This maximum can vary by configuration, but it's usually capped (for example, at 150,000 results). If your query would return more than that, the excess is silently dropped.

Always use limit= explicitly if you want to stay within safe boundaries and avoid truncated data.

Summary of Useful Parameters

ParameterDescription
limit=NReturns the first N results
limit=-NReturns the last N results
fastLatest=trueOptimizes queries for latest snapshots
offset=MSkips the first M results (for pagination)

Understanding how to limit and navigate result sets is critical for working with large archives efficiently. Whether you're trying to get a quick snapshot or dig into historical patterns, these options give you the control you need.

Smartial.net offers user-friendly tools built on top of the Wayback Machine to help you explore, filter, and extract archived content with these parameters already built in.

Comments