A robots.txt file is a way for your website to communicate with web crawlers. It’s a great tool for webmasters to control how web crawlers view pages on your site. Taking control of what web crawlers can and cannot see is an incredible useful way to boost your SEO efforts. Residing at the root of your site, the robots.txt file is a collective of rules that either block or allow certain crawlers from accessing a certain segment of your website. These crawl instructions are specified by “disallowing” or “allowing” certain user agents (web crawlers) to take action.
Below, we have listed the 3 main ways that we like to use a robots.txt file to influence web crawlers.
The primary purpose for a robots.txt file is for blocking and allowing web crawlers from accessing certain URLs on your website. The main reason for doing this is to avoid overloading your site with requests. The basic format is as follows:
User-agent: [user-agent name]
Disallow: [URL string or subfolder not to be crawled]
This is how you form a basic robots.txt file for your website. There can be multiple user agents as well as multiple disallowed URLs. It is important to note that this is not a mechanism for keeping a webpage off Google. If you’re looking to remove a link from Google search, check out our blog post on how to do so.
Crawl budget is a term coined by the SEO industry, that indicates the number of pages a search engine is able to crawl. Search engines assign a crawl budget so that they can divide their attention across millions of websites.
It’s not something that most of us need to worry about, but if you have a large website, then blocking unimportant URLs from being crawled can mean that your most vital pages are crawled more often. Removing low value URLs from being crawled serves to increase crawl activity on pages that hold value. Instructing search engines not to access such URLs can be done with your robots.txt files using the following text:
Blocking all crawlers from a specific folder
<strong>Blocking all crawlers from dynamic URL varients </strong>
It’s important to note that blocking crawlers will not remove a link from Google search, rather it will stop crawlers from reading the page.
A sitemap is a greatly important page on your website, so it makes sense to make it as easy as possible for Googlebot (Google’s search engine bot) to locate it. Entering the location of your XML sitemap on your robots.txt file is a must-have to help search engine bots locate all important pages on your website. This is usually placed at the bottom of the robots.txt file in the following format:
Sitemap: [sitemap url]
As a separate tip – make sure you also submit your sitemap URL to Google Search Console to make it even easier for Googlebot to find it!
If you ask the question “what can a robots.txt file do for my SEO?” there are so many positive answers! If you don’t have a robots.txt file, you’ll need to create one from scratch with a plain text editor (notepad). This will need to be uploaded in the FTP section of your site. If you’re not used to navigating your source code, it might be better to get in touch with SEO experts to help you.
If you need help implementing technical SEO on your website, please do not hesitate to get in contact with Global Search Marketing.