The Wayback Machine is like the internet’s attic. Most of it is dust, but when you need an old version of a page, it’s suddenly the most beautiful place on earth.
This guide explains the three APIs you’ll actually use in 2025: Availability (closest snapshot), CDX (list snapshots), and SavePageNow (request a save). I’ll include copy‑paste examples so you can try it without writing a whole program.
TL;DR
-
Want the closest snapshot?
Use
https://archive.org/wayback/available?url=… -
Want a list of captures?
Use
https://web.archive.org/cdx/search/cdx?url=… - Want to request a new snapshot? Use SavePageNow (works best with backoff and patience).
- Don’t want to code? Use TinyUtils Wayback Fixer.
1) Availability API (closest snapshot)
This is the simplest endpoint: give it a URL, get back the closest archived snapshot (if one exists). It’s perfect for “does the archive have this?” checks.
Example (closest snapshot)
curl "https://archive.org/wayback/available?url=https://example.com/"
Example (closest to a date)
curl "https://archive.org/wayback/available?url=https://example.com/×tamp=20200101"
In the JSON response, you’ll typically look for:
archived_snapshots.closest.url, archived_snapshots.closest.timestamp, and archived_snapshots.closest.status.
2) CDX API (list captures and filter them)
If the Availability API is “give me the closest snapshot”, CDX is “show me everything you’ve got.” You can filter by date ranges, status codes, and dedupe results.
Example (list captures, JSON output)
curl "https://web.archive.org/cdx/search/cdx?url=example.com/&output=json"
Example (filter to successful captures)
curl "https://web.archive.org/cdx/search/cdx?url=example.com/&output=json&filter=statuscode:200"
Example (date range + fields)
curl "https://web.archive.org/cdx/search/cdx?url=example.com/&output=json&from=2020&to=2025&fl=timestamp,original,statuscode"
Common CDX parameters
url— required.from/to— date range (YYYY or YYYYMMDDhhmmss).filter— filter results (e.g.statuscode:200).collapse— dedupe adjacent results (e.g.digestortimestamp:8).matchType— exact vs prefix/host/domain matching.fl— fields you want back (timestamp, original, statuscode, etc.).
matchType examples (when you want more than one page)
By default you’ll often query one URL. If you want broader coverage:
# One exact URL
curl \"https://web.archive.org/cdx/search/cdx?url=example.com/docs/getting-started/&output=json\"
# Everything under a path prefix
curl \"https://web.archive.org/cdx/search/cdx?url=example.com/docs/&matchType=prefix&output=json\"
# Everything on a host
curl \"https://web.archive.org/cdx/search/cdx?url=example.com&matchType=host&output=json\"
These broader queries can return a lot of data, so combine them with filters (from, to, collapse) unless you enjoy scrolling.
Practical tip: if you only need one snapshot per day, use collapse=timestamp:8.
It keeps results usable.
Building a Wayback URL
Once you have a timestamp from CDX, you can build a “real” archive URL like this:
https://web.archive.org/web/<TIMESTAMP>/<ORIGINAL_URL>
# Example
https://web.archive.org/web/20231002123456/https://example.com/docs/getting-started/
If you want the archive to “rewrite” assets (images/CSS) to point at archived copies, you’ll often see a * after the timestamp:
https://web.archive.org/web/20231002123456*/https://example.com/docs/getting-started/
3) SavePageNow (request a new snapshot)
SavePageNow is the “please archive this now” flow. It’s useful when a page is disappearing or when you’re preserving sources. The big two rules:
- Expect rate limits (HTTP 429) and add backoff.
- Not everything can be archived (blocked pages, dynamic apps, robots restrictions, etc.).
A simple way to trigger a save is visiting:
https://web.archive.org/save/https://example.com/
in a browser.
For bulk and programmatic workflows, you’ll want to use the documented APIs and handle retries.
Be polite (rate limits and retries)
If you’re querying lots of URLs, treat the archive like a shared resource:
- Use a small concurrency (a few requests at a time).
- Retry on
429and5xxwith a short jitter delay. - Cache results so you don’t re-hit the same URLs repeatedly.
Common gotchas (stuff that looks like “the API is broken”)
Most Wayback API confusion comes down to URLs not matching the way you think they match. A few quick checks can save you a lot of head-scratching.
- Try the obvious variants:
httpvshttps, with and without a trailing slash, and with or withoutwww. - Be careful with query strings: CDX can treat
?ref=and tracking params as “different pages.” Consider stripping obvious junk before you query. - Redirects change things: you might have snapshots of the final URL, not the original redirecting URL. If your availability result looks empty, try the destination.
- Dynamic sites archive poorly: sometimes you get a shell of HTML and missing JS/API calls. That’s not your fault; it’s just how archiving works.
- Robots rules can bite later: access policies can change over time, and some snapshots become unavailable.
If you only need one working citation link, the fastest test is still: open the snapshot in a browser, scroll, click the key links, and make sure the page actually renders.
No-code option (bulk restore/export)
If what you really want is “give me archived versions for these dead URLs”, you don’t need to code. Use TinyUtils Wayback Fixer to bulk map URLs to snapshots and export the results.
Next steps
Start with the Availability API for a single URL. When you need lists and filters, move to CDX. When you need to preserve a page before it disappears, use SavePageNow (and add backoff).
Need a no-code solution?
Use Wayback Fixer to bulk recover archived URLs without coding
Try Wayback Fixer →