June 22 2009
Why does Google ignore my site?
You see this question pop up in many forums. It’s usually posted by someone working on a new site. “Why does google ignore my site?” may come from someone who has never created a Web site before or someone who hasn’t created a site in a very long time.
In the old days all you had to do was build a site and submit it to search engines. But the search engines that were responsive to direct submission have all vanished or changed their technologies. The rules of submission have changed.
Many people seem to only use the obsolete “ADD YOUR URL” pages that Google and Yahoo! still provide. I haven’t seen either search engine act on a direct submission in years. I have not tested BING’s submission page.
Very often a major search engine will crawl your new site several times without indexing it. The are aware of your site but they have filters and tests they use to determine whether your site may be a spam site. Think of these tests as credibility tests.
The first credibility test may consist of nothing more than a quick check to see if you’re doing anything that violates search engine guidelines. Merely passing this test does not ensure the search engine will include your site in its index. All it does is ensure your site won’t be banned before it gets indexed.
The second credibility test may consist of little more than a quick count of inbound link references. It doesn’t have to mean anything but if a new site is already capturing links it could mean either there is some compelling content on the site or that it has some promotional backing behind. Promotional links don’t mean the site is spammy, but a LOT of promotional links might look suspicious.
If your new site doesn’t have any inbound links then search engines like Google, Yahoo!, Microsoft, and Ask may wait before adding it to their index. They all maintain at least two Web indexes and several of the major search engines have been known to show sites from their secondary indexes (Ask appears to NOT show results from its secondary index).
If Google seems to be ignoring your new site and you’re doing nothing wrong, the most likely reason is that you just don’t have any inbound links to validate your site. But before you rush out to create social media profiles to get some quick links, think about where you can get truly credible, authenticating links. Maybe (if your site is a commercial site) you want to buy some listings in major business directories. Be sure they don’t use “rel=’nofollow’” on their links, however.
Once you start getting links to your site you have to wait for the search engines to recrawl AND reindex those pages. Recrawling may take up to 2 months. Reindexing may take another 1-2 weeks. Getting links from pages that are recached frequently is better than getting links from pages that are only occasionally recached. But just because a page has a high recache rate doesn’t mean its links will pass value. If the page has been identified by the search engines as a link seller, it may never pass any value.
To be credible, a link has to come from a reliable site (a site that the search engines trust); the link should be (in my opinion) embedded in content that is topically related to the topic of your content; and the link anchor text should agree with copy on your page. Will a search engine allow links to pass value if they don’t meet these three criteria? Yes. However, you’re more likely to have credible links if they comply with this minimal standard.
If you have credible links and you’re sure the search engine has found them, but your site still does not appear in search results, there may be technical issues you have to resolve. You should have built a checklist of technical things (robots.txt, consistent internal URL structure, appropriate meta tag content, etc.) when you launched your site. Go back over your checklist. You might have left a block in place that prevents the engines from indexing your site.
Google will ignore your site if you tell it to. So will Ask, Yahoo!, and Microsoft. You can tell a search engine to ignore your site with the Disallow directive in robots.txt, by breaking your internal links (clicking on them will tell you if they are broken), by using the wrong kind of links (e.g., image maps, flash links, links in floating navigation bars, etc.), and with the “noindex,noarchive” options in robots meta tags. You can either use the generic “robots” or name specific search engines like “slurp” (for Yahoo!), Google, “teoma” (for Ask), and “msnbot” (for Microsoft).
Using “rel=’nofollow’” on internal links will also prevent search engines from fully crawling and indexing your site.
Generally speaking, I have noticed that search engines are less likely to show 1-page sites than they are to show 100-page sites. That’s not a hard-and-fast rule as with enough inbound links you can get a 1-page site to appear in the index. But the more content pages you have a site, the more your own internal linking can help validate your site. Good site structure and robust site content help tremendously with indexing because every page a search engine dissects will add links to the search engine’s crawl queue.
You can also use sitemaps to help get your pages crawled and indexed faster. There are four types of sitemaps you can create for your Web site.
- HTML Sitemap – This is for your visitors and can be broken up into multiple pages, but search engines love HTML sitemaps.
- XML Sitemap – This is only for search engines and you can have up to 50,000 on your site.
- XML Sitemap Index – Basically a listing in XML format of all your XML sitemaps. You can have up to 50,000 of these, too (but that doesn’t mean you should hope you’ll get 250,000,000 XML sitemaps crawled).
- TXT sitemaps – These are alternative formats that provided limited value but they work for sites that don’t have the ability to create or host XML sitemaps.
You can include links to your XML and TXT sitemaps in your “robots.txt” file. Doing this helps speed up the crawling process a little bit but the fastest way to get a site crawled is to authenticate/register the site with a search engine’s Webmaster console (Google, Yahoo!, and Microsoft all offer this option) and submitting the sitemap there. A couple of search engines (Ask and BING) allow you to directly ping their sitemap crawlers.
Simply getting crawled doesn’t ensure you’ll be indexed, but if you have a couple hundred pages on your site you should see at least some of them appear in the US search results within about 2 weeks for US search engines. I’ve noticed that non-US top-level domains may take up to 2 months to appear in the US search engines regardless of how many links point to them (except for sites that draw hundreds or thousands of links from diverse resources naturally). I cannot explain the apparent bias against non-US top-level domains in US search but I don’t consider it to be anything people should worry about.
Once your site appears in the search index you should start testing its search visibility. That means you should start looking for page titles from your site. If they don’t appear in the search results (or if they are buried way down past the first page) that probably means one of two things: 1) you’re targeting competitive keywords OR 2) you’re stuck in the Supplemental Results Index.
You have to be patient. It may take up to a month after your site first appears in the search index before it reaches that equilibrium point where its rankings will hover. A normal site’s rankings per page fluctuate up to 5 positions over the course of a week. Your pages can go up and they can go down. Checking your rankings more frequently than once a week will drive you insane with needless worry.
If you’re trying to invade a competitive query you probably will need some links, but don’t go on a link-building binge. Instead, develop a linking strategy that doesn’t obsess over search results. Every link that sends you direct traffic makes you less dependent upon search engines. A happy Webmaster could care less about why Google ignores his site.
Written by Michael Martinez




