How to Export Content from Archive.org Snapshots into WordPress or Publii
If you have an old website that no longer exists but is partially or fully saved in the Wayback Machine, you can recover it and bring it back to life - even into modern platforms like WordPress or Publii. This article walks through the full process of exporting content from archive.org snapshots, cleaning it, and migrating it into a working, editable website.
Start by Finding the Most Complete Archive Snapshot
Go to https://archive.org/web and type the full domain of the site you want to recover. Look for a snapshot from a date when the site was functioning properly - preferably when the content was complete and links worked.
Click around to verify:
Pages load correctly
Images and CSS appear as expected
Navigation and menus work
Internal links don’t point to error pages
You can use Smartial’s Wayback Domain Scanner to get a full list of archived URLs for any domain and year. This is extremely useful if you're restoring more than just a homepage.
Download the Archived Pages and Assets
For blog-based or article sites, consider extracting readable text with Smartial’s Content Extractor ideal for moving content into a new CMS without layout noise.
Other than that, there are several methods to retrieve the content:
Manual Save: Right-click → Save Page As
Wayback Machine Downloader (paid tool)
ArchiveBox or Webrecorder (free, open-source)
Custom scraping scripts for bulk capture
Save HTML files, CSS stylesheets, JS files, and as many images or documents as possible.
Clean the HTML for Reuse
Wayback snapshots often include archive-specific code, banners, and wrappers. These need to be removed to avoid layout issues and broken links.
What to clean:
Remove archive navigation toolbars
Fix internal links that point to
web.archive.org
Update image paths if they point to external CDN URLs
Simplify inline styles and unnecessary JS
Normalize character encoding (common problem with non-English sites)
You can automate parts of this process with find-and-replace in a good code editor.
If you're unsure how much has changed, consider running the site through the Wayback Domain Auditor to check for suspicious redirects, content shifts, or injected junk.
Choose Your Platform: WordPress vs. Publii
Both WordPress and Publii can work, it depends on your goals.
Use WordPress if:
You want to continue publishing new content
You need plugins, SEO tools, or dynamic functionality
You’re rebuilding a blog, news site, or complex archive
Use Publii if (like us here on smartial.net):
You prefer a static site that works offline
You want speed and simplicity
You’re rebuilding a portfolio, old blog, or reference site
In the article How to Restore a Full Website from the Wayback Machine, I explained how Publii can be ideal for rebuilding simple websites with a modern feel - and no server dependencies.
Convert Static Pages into CMS-Friendly Format
Now you’ll need to import your cleaned content:
For WordPress:
Create new pages/posts manually and paste cleaned HTML
Or use a plugin like WP All Import (with custom templates)
For blog archives, match original dates/titles/categories
For Publii:
Create a new post in the app and paste the cleaned content
Assign images to media folders and match with the post
Recreate the original URL slug if SEO is a concern
Take your time to replicate structure - menus, categories, and sidebars, to reflect the old site’s layout.
Rebuild or Reuse the Visual Design
You don’t have to restore the exact design unless it has historical or branding value. In most cases, you can:
Use a modern theme with similar layout
Copy over fonts, colors, and logo from screenshots
Rebuild navigation manually to match old structure
This is also a good opportunity to make the site mobile-friendly and faster.
Things to Watch Out For
Duplicate content: If the restored pages were indexed and live somewhere else, you might need to rewrite or use canonical tags
Lost media: Some PDFs, videos, or large images might never have been archived
Broken code: JavaScript-based features (search, forms) may no longer work
Legal risks: If the content isn’t yours, you may need permission to republish it
For legal considerations, see Can You Use Wayback Machine as Evidence? as it applies to restoration too.
Use Smartial Tools for Faster Workflow
To streamline large site restoration:
Use the Wayback Scanner to list every archived URL
Check health and age with the Domain Auditor
Compare year-by-year archive volume with the Expired Domain Comparator (useful if you’re choosing between multiple old domains)
Exporting content from archive.org snapshots into WordPress or Publii isn’t just possible, it’s one of the best ways to bring forgotten web projects back to life. With the right tools and a bit of cleanup, you can rescue lost content and give it a modern home.