Enter more information about the Robots.txt Generator tool!
Robots.txt can be used quickly and easily with the help of this RobotsST generator. Create a file. Help search engines properly index your page by using a robot.txt file. By default, this tool will allow major search engines to crawl every part of your webpage - if there are any fields that you would like to exclude, simply add them to this file and upload it to your root directory.
A robots.txt file is the root of your site. So, for the site www.example.com, the robots.txt file is located at www.example.com/robots.txt, at robots.txt, a plain text file that follows the robots extension value. A Robot.ST file contains one or more rules. Each rule blocks access to (or allows) a given crawler on the specified file path on that website.
Here's a simple robots.txt file with two rules explained below:
# Group 1 User-agent: Googlebot Disallow: /nogooglebot/ # Group 2 User-agent: * Allow: / Sitemap: http://www.example.com/sitemap.xml
The user agent known as the "Googlebot" crawler should not crawl the http://example.com/nogooglebot/ folder or any subdirectories.
All other user agents can access the entire site. (This can be omitted, and the result may be the same, because the full access estimate)
The Sitemap file for the site is located at http://www.example.com/sitemap.xML
We will provide more detailed examples later.
Here are some basic guidelines for robots.txt files. We recommend that you read the entire syntax of the RobotsTTS file because the robot text syntax has some fine behaviors that you should understand.
Use the robots.txt tester tool to write or edit robots.txt files for your site. This tool enables you to test syntax and behavior against your site.
Format and location rules:
The filename must be robot text
Your site may contain only one robot context file.
The robots.txt file must be at the root of the implemented website host. For example, to control crawling of all URLs below http://www.example.com/, the robots.txt file must be located at http://www.example.com/robots.txt. It cannot be placed in a sub-directory (for example, at http://example.com/pages/robots.txt). If you do not have access to the source of your website, but if you would like permission, contact your web hosting service provider. If you cannot access the root of your website, use alternative blocking methods such as meta tags.
A Robot.SST file can be applied to subdomains (for example, http://website.example.com/robots.txt) or non-standard ports (for example, http://example.com:8181/robots.txt).
Comments are content after a # sign.
robots.txt must be a UTF-8 encoded text file (which includes ASCII). It is not possible to use other character sets.
A robots.txt file contains one or more groups.
Each group has multiple rules or instructions, one instruction per line.
A group gives the following information:
Who Implements the Group (User-Agent)
An agent can access any directory or files and/or
Agents cannot access any directories or files.
Groups are processed from top to bottom and a user agent can only match a set of rules, the first, precise rule to match a given user agent.
The default idea is that no user agent can crawl a page or directory to cancel it: not blocked by rules.
The rules are case-sensitive. For example, discard/file.asp http://www.example.com/file.asp applies, but not http://www.example.com/FILE.asp.
The following instructions are used in the robots.txt files:
User-Agent: [one or more required per group] This rule applies to a search engine robot (web crawler software). This is the first line for any rule. Most Google user-agent names are listed in the Web Robot Database or Google User-Agent lists. * Support wildcards for path prefixes, suffixes, or entire strings. The example below uses different asterisks (*) to match all crawlers, whose names must be explicitly specified. (See the list of Google crawler names) Examples:
# Example 1: Block only Googlebot User-agent: Googlebot Disallow: / # Example 2: Block Googlebot and Adsbot User-agent: Googlebot User-agent: AdsBot-Google Disallow: / # Example 3: Block all but AdsBot crawlers User-agent: * Disallow: /
Disallow:[At least one or more Disallow or Allow entries per rule] A directory or page, relative to the root domain, that should not be crawled by the user agent. If a page, it should be the full page name as shown in the browser; if a directory, it should end in a / mark. Supports the * wildcard for a path prefix, suffix, or entire string.
Allow:[At least one or more Disallow or Allow entries per rule] A directory or page, relative to the root domain, that should be crawled by the user agent just mentioned. This is used to override Disallow to allow crawling of a subdirectory or page in a disallowed directory. If a page, it should be the full page name as shown in the browser; if a directory, it should end in a / mark. Supports the * wildcard for a path prefix, suffix, or entire string.
Sitemap:[Optional, zero or more per file] The location of a sitemap for this website. Must be a fully-qualified URL; Google doesn't assume or check http/https/www.non-www alternates. Sitemaps are a good way to indicate which content Google should crawl, as opposed to which content it can or cannot crawl. Learn more about sitemaps. Example:
Sitemap: https://example.com/sitemap.xml Sitemap: http://www.example.com/sitemap.xml
Other rules are ignored.