What’s the least redundant way to make a site with JavaScript-generated HTML crawlable?

Why didn’t I think of this before! Just use http://phantomjs.org. It’s a headless webkit browser. You’d just build a set of actions to crawl the UI and capture the html at every state you’d like. Phantom can turn the captured html into .html files for you and save them to your web server.

The whole thing would be automated every build/commit (PhantomJS is command line driven). The JS code you write to crawl the UI would break as you change the UI, but it shouldn’t be any worse than automated UI testing, and it’s just Javascript so you can use jQuery selectors to grab buttons and click them.

If I had to solve the SEO problem, this is definitely the first approach I’d prototype. Crawl and save, baby. Yessir.

Leave a Comment