How to prevent Google from indexing your site’s sitemap.xml file
Have you ever searched for your own website in Google and noticed the sitemap.xml file appearing in the search results? If you haven’t then it’s probably not something worth bothering about, however, if like me you have, then it might be something worth changing.
Every site should have a sitemap (primarily to tell search engines about pages on your site that they might not otherwise discover), however, as far as I can tell, there’s no reason why the sitemap itself should appear in the SERPs – since the sitemap.xml isn’t a page that anybody visiting your site will normally understand, let alone want to visit.
Here’s what I saw a few weeks ago after searching for ‘WinningWP’ in Google:
…hardly ideal, since this space could (and probably should) be better taken up by a more useful/engaging page.
So what to do about it? Well, you could do nothing and hope that Google eventually realizes your site has more worthy pages to display, or since there seems to be no decent reason to have Google (and other search engines) show this page in the SERPs, you could go about informing Google, etc, that you’d rather it wasn’t shown by adding a robots meta tag value of noindex to the page in order to request that automated Internet bots avoid indexing it in future.
There’s a few different ways to go about adding the noindex robots meta tag value to your xml sitemap, however, since our sitemap is generated automatically (we have Google XML Sitemaps generate ours) and there isn’t an obvious way to add the nonindex value to the generated sitemaps directly, we’ll have to add it to the page indirectly – by adding the following code to the .htaccess file (more on this particularly handy file in a future post):
<IfModule mod_rewrite.c> <Files sitemap.xml> Header set X-Robots-Tag "noindex" </Files> </IfModule>
…remembering to change ‘sitemap.xml’ to the name of your own XML sitemap.
Did it work?
Once you’ve done the above, head on over to URI Valet and type in complete URL that leads to your sitemap in the ‘URL’ field and hit ‘submit’. If everything has gone to plan, you should now be able to see “X-Robots-Tag: noindex” displayed somewhere down the page.
Acknowledgements: original solution attributed to JohnMu
What do you think? Is this something worth bothering about?