January 14 2009
Building a Bot Accessible Archive
Tagged Under : archive, bot, bot accessibility, bot accessible, bot accessible archive, robot, search agents

Bot accessibility is about making sure that search agents known as spiders, robots or “bots” for short, have full accessibility to all the content that you want displayed to the public.
Creating a growing archive that remains accessible to both bots and the public can be instrumental to rapidly increasing the size, strength and overall value of your web property. Like most things in SEO, it is much easier to plan ahead for an effective archive rather than reorganizing one that has been indexed.
Rather than focusing on which database to use or how to build with one of the many excellent content management systems out there, I’m going to list a few key points to watch for.
Bot Accessible Archive Dos:
Each Document Should Have Its Own URL
Whether using pages, posts (or anything else) make sure that each article/press release/product has its own URL. Giving each subject its own URL is probably the most important aspect of proper archiving.
Persnickety Order
Create a logical hierarchy and categorize groupings of similar subject matter. Clean informative structure is as much about human accessibility as it is about bot accessibility.
Proper Labeling and Legible URLs
On the server side, convert all database generated URLs to descriptive URLs. The actual logistics of this have everything to do with your CMS and platform. Labeling the tags and content in your archive in a way that is human intuitive will allow indexing bots to serve your pages to the correct searchers.
Using legible URLs can convey on the SERP level what exactly is on a page and may even lend to your target keywords. Searchers will have to rely less on your descriptions and title tags to surmise what your page is about.
Example:
Bad Archive Structure
Good Archive Structure
Bot Accessible Archive Don’ts:
Flash Archives
Yes I have seen Flash archives that were pretty to look at but overall somewhat cumbersome to navigate and not necessarily bot accessible. Since my guess is that you aren’t trying to impress anyone with your creative prowess as they rummage through your archive, it’s best to apply KISS and focus on functionality.
Ajax Archives and iFrame Archives
Both Ajax and iFrames tend to have a single URL and display content through a window on the page. The problem is when you navigate to the press release or document that you are searching for there is virtually no way to share it without downloading or copy and pasting it.
PDF Archives
While it is possible for a bot to crawl and index a PDF document, people generally hate to see PDFs in search results unless they are searching specifically for a product manual or academic paper. PDFs make good product brochures and lousy product pages.
PDFs appear to pass value such as anchor text, but from a user intuitive standpoint can make awkward link placement. A user clicks on a PDF, it downloads, a link is visible so they click it and the browser opens a new window. Ick!
If your archive happens to be of product manuals or academic papers you might want to consider creating a database driven archive of pages containing an abstract, author and a way to download the PDF.
Happy archiving!
Written by Nicholas Ramirez

