Your robots.txt file is a powerful tool when you’re working on a website’s SEO – but it should be handled with care. It allows you to deny search engines access to different files and folders, but often that’s not the best way to optimize your site. Here, we’ll explain how we think webmasters should use their robots.txt file, and propose a ‘best practice’ approach suitable for most websites.
You’ll find a robots.txt example that works for the vast majority of WordPress websites further down this page. If want to know more about how your robots.txt file works, you can read our ultimate guide to robots.txt.
What does “best practice” look like?
Search engines continually improve the way in which they crawl the web and index content. That means what used to be best practice a few years ago doesn’t work anymore, or, may even harm your site.
Today, best practice means relying on your robots.txt file as little as possible. In fact, it’s only really necessary to block URLs in your robots.txt file when you have complex technical challenges (e.g., a large eCommerce website with faceted navigation), or when there’s no other option.
Blocking URLs via robots.txt is a ‘brute force’ approach, and can cause more problems than it solves.
For most WordPress sites, the following example is best practice:
We even use this approach in our own robots.txt file.
What does this code do?
If you have to disallow URLs
If you want to prevent search engines from crawling or indexing certain parts of your WordPress site, it’s almost always better to do so by adding meta robots tags or robots HTTP headers.
Our ultimate guide to meta robots tags explains how you can manage crawling and indexing ‘the right way’, and our Yoast SEO plugin provides the tools to help you implement those tags on your pages.
If your site has crawling or indexing challenges that can’t be fixed via meta robots tags or HTTP headers, or if you need to prevent crawler access for other reasons, you should read our ultimate guide to robots.txt.
Note that WordPress and Yoast SEO already automatically prevent indexing of some sensitive files and URLs, like your WordPress admin area (via an x-robots HTTP header).
Why is this ‘minimalism’ best practice?
Robots.txt creates dead ends
Before you can compete for visibility in the search results, search engines need to discover, crawl and index your pages. If you’ve blocked certain URLs via robots.txt, search engines can no longer crawl through those pages to discover others. That might mean that key pages don’t get discovered.
One of the basic rules of SEO is that links from other pages can influence your performance. If a URL is blocked, not only won’t search engines crawl it, but they also might not distribute any ‘link value’ pointing to that URL to, or through that URL to other pages on the site.
Google fully renders your site
Previous best practice of blocking access to your
wp-includes directory and your plugins directory via
robots.txt is no longer valid, which is why we worked with WordPress to remove the default disallow rule for
wp-includes in version 4.0.
The robots.txt standard supports adding a link to your XML sitemap(s) to the file. This helps search engines to discover the location and contents of your site.
We’ve always felt that this was redundant; you should already by adding your sitemap to your Google Search Console and Bing Webmaster Tools accounts in order to access analytics and performance data. If you’ve done that, then you don’t need the reference in your robots.txt file.
This content was originally published here.