Home > Guide > Robots Txt Full Guide

What is a Robots txt with a full guide

Reviewed and Published on: March 24, 2026 at 04:45 PM

Written By

Ashish Adhikari

Writer and Founder

Reviewed By

Angelica Leicht

Product Designer

Follow wpvoid on:

Action triggered!

Managing website visibility feels like a battle against automated bots. Many struggle with server overloads or finding their private admin pages indexed in search results because they haven’t implemented a proper definition and rules for automated bots.

The solution lies in a simple yet powerful plain-text document known as the robots.txt file.

In this robots txt full guide, you’ll learn how to create robots.txt directives, understand robots.txt syntax, advanced AI blocking techniques, and use a robots.txt generator to ensure your website remains professional and secure.

Topic we Covered

🔑 KEY HIGHLIGHTS

1) Directs robot User-agent: * –It acts as a wildcard that refers to all bots and crawlers (like Google, Bing, and AI scrapers).

2) Blocks AI Scrapers: You can use a deny robots.txt command to stop specific Google robots.txt or AI agents from scraping your original content.

3) Optimizes Crawl Budget: A proper robots.txt setup prevents search engines from wasting resources on low-value pages, like robots disallowing admin areas.

4) Essential SEO Tool: Mastering seo robots txt is a pillar of technical optimization, ensuring your high-value pages rank while you block robots.txt from private folders.

What is a robots.txt file?

A robots.txt file is a set of rules or files that tell crawlers about pages to crawl and to avoid using Allow and Disallow rules. This plain-text document is hosted at the root directory of the website and manages bot traffic to prevent server overload.

The robots.txt file functions under the Robots Exclusion Protocol (REP) and serves as a "Code of Conduct" for automated agents, including search engine spiders like Googlebot and AI training crawlers.

Robots.txt file example and its explanation

Here is a breakdown of the basic robots.txt syntax and an explanation of each term:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/
Sitemap: https://wpvoid.com/sitemap.xml

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/
Sitemap: https://wpvoid.com/sitemap.xml

a) User-agent: * –It acts as a wildcard that refers to all bots and crawlers (like Google, Bing, and AI scrapers).

b) Disallow: /wp-admin/ – This is a standard security and SEO practice. It prevents bots from wasting time crawling your backend login and management pages, such as admin, which should never appear in Google search results.

c) Allow: /wp-admin/admin-ajax.php – WordPress uses this specific file to load dynamic content (like some menus or popups). Allowing it will ensure that your site functions correctly for the crawler.

d) Sitemap: This provides a direct roadmap for bots to find your latest blog posts and pages.

Why do you need a robots.txt file?

The robots.txt file might seem like a minor technical detail, but it is considered the pillar of the SEO robots.txt strategy. Here is why it is essential for every SEO strategist.

a) Optimizing Crawl Budget: Search engines only spend a limited amount of time on each site. By blocking unimportant pages (like /cart/ or /login/ or /admin/), SEO strategists ensure bots spend their time on your high-value content, making the crawl budget more optimized.

b) Preventing Server Overload: Sometimes, the high-frequency bots, particularly new AI crawler bots, can put a massive strain on the website server. Adding robots.txt helps to throttle this activity in an efficient way.

c) Blocking Specific Bots: Blocking particular bots is prominent in a media site. To cite an example, news sites like Forbes, The New York Times, etc, often block AI agents from scraping their articles to ensure their factual reporting isn't used to train LLMs without compensation or credit.

d) Managing Media Indexing: While it's not great for hiding HTML pages, robots.txt is very effective at keeping images, videos, and audio files from showing up in Google Search results.

Warning: To keep a page entirely private or "noindexed," do not rely on robots.txt. Instead, you should use password protection or a noindex meta tag.

What to look for when creating a robots.txt file

Creating a robots.txt file is super easy, while using an effective strategy is a must. Here is a list of things to ponder before making a robots.txt file.

1) Identifying the Bot Agent: Make sure to tailor instructions to specific bots. For instance, if you want Google to see everything but want to block Bing, you would use User-agent: bingbot.

2) Strategic Blocking: Ensure the common candidates on your site have a robots.txt disallow command, such as:

a) Admin panels (e.g., /wp-admin/)
b) Temporary or staging areas.
c) Search result pages within your own site.
d) Duplicate content versions.

3) Search Console Verification: Once you upload your file, make sure it works. You can use the robots.txt Tester provided by Google in the Google Search Console.

A) Upload: Ensure it is at yourdomain.com/robots.txt.

B) Verify: Use the Search Console to see if Googlebot encounters any errors or warnings.

C) Check Status: If you accidentally block Googlebot, your site will disappear from search results. Always double-check!

Try Search Console to validate your robots.txt file

You can ensure that your robots.txt file is valid or not through Google’s Search Console. To check the status, follow these simple steps:

Step 1: Open the Search Console.
Step 2: Navigate to Settings on the left-hand side.

Search console wpvoid for robots.txt — Search console of WPVoid for robots.txt

Step 3: Search for the Crawling section and open the report of the robots.txt file.

Step 4: Check the status, whether it’s fetched or not.

robots.txt fetch in wpvoid — Robots.txt fetched in wpvoid

How the 4 top sites' robots.txt file looks

Here is the list of famous websites’ robots.txt and how they handle their crawlers:

Forbes: Their file demonstrates complex management of massive content volumes, optimizing how search engines crawl and index their extensive article database.

forbes robots.txt file — Forbes robots.txt file

Instagram: This file serves as a primary example of strict "Disallow" rules, prioritizing user privacy by limiting bot access to sensitive areas.

instagram robots txt file — Instagram robots.txt file

Wix: This example highlights how a major web platform manages crawling rules to protect and organize its underlying technical infrastructure.

wix robots txt file — Wix robots.txt file

YouTube: Famous for including a humorous "Easter Egg" about a robotic uprising of the year 2000!. This file effectively balances technical instructions with personality.

youtube robots txt file — YouTube robots.txt file

How to create a Robots.txt File

There are two primary ways to generate robots.txt files, depending on your technical comfort level.

a) Create a Robots.txt file manually

You can create a robots.txt file in a simple text file; you don't need special software.

Step 1: Open Notepad (Windows) or TextEdit (Mac).

Step 2: Type your directives (e.g., User-agent: * followed by Disallow: /).

robots txt file in notepad — Robot.txt file in Notepad

Step 3: Save the file with the name robots.txt.

Step 4: Upload it to the root directory of your website via FTP or your hosting file manager.

robots.txt file manager upload — Robots.txt file manager upload

b) Using a WordPress plugin to create a Robots.txt file

If you aren't comfortable with server files, the robots txt wordpress plugin (like Yoast SEO or Rank Math) is your best bet.

Step 1: Install an SEO plugin.

Step 2: Navigate to the "Tools" or "File Editor" section.

Step 3: The plugin will provide an interface to edit the robots.txt content without needing to leave your dashboard.

Use this easy guide to create a Robots txt full guide file

Setting up a robots.txt file is a fundamental step in technical SEO, especially for an SEO optimization process.

Here is your step-by-step guide to building one from scratch, following professional standards for 2026.

Step 1: Define the User-Agent

Every robots.txt file starts by identifying who the rules apply to.

Syntax: User-agent: *

Meaning of the *: The asterisk (*) is a wildcard. It tells every bot on the internet (Google, Bing, DuckDuckGo, and scrapers) to follow the rules that follow this line.

Syntax: User-agent: *

Meaning of the *: The asterisk (*) is a wildcard. It tells every bot on the internet (Google, Bing, DuckDuckGo, and scrapers) to follow the rules that follow this line.

Syntax: User-agent: *

Meaning of the syntax: The asterisk (*) is a wildcard. It tells every bot on the internet (Google, Bing, DuckDuckGo, and scrapers) to follow the rules that follow this line.

Syntax: User-agent: *

Meaning of the syntax: The asterisk (*) is a wildcard. It tells every bot on the internet (Google, Bing, DuckDuckGo, and scrapers) to follow the rules that follow this line.

Step 2: Set Universal Disallows (The "Stay Out" Rules)

Now, you tell those bots which parts of your site are off-limits or to block. For a WordPress site, you want to protect your backend and login pages.

Syntax: Disallow: /wp-admin/

Meaning of syntax: This blocks bots from crawling your administrative dashboard. It saves your "crawl budget" for actual content pages.

Syntax: Disallow: /wp-admin/

Meaning of syntax: This blocks bots from crawling your administrative dashboard. It saves your "crawl budget" for actual content pages.

Syntax: Disallow: /wp-login.php

Meaning of syntax: Specifically blocks the login page to reduce bot-driven login attempts.

Syntax: Disallow: /wp-login.php

Meaning of syntax: Specifically blocks the login page to reduce bot-driven login attempts.

Step 3: Protecting Your Content from AI Bots

If your website publishes original tutorials or news, you may want to prevent AI companies from "scraping" your data to train their models without permission. To do this, you address specific AI user-agents.

1) Blocking OpenAI (ChatGPT/GPT-4)
OpenAI uses a bot called GPTBot. Blocking this prevents your data from being used to train future versions of ChatGPT.

Syntax:

User-agent: GPTBot
Disallow: /

Syntax:

User-agent: GPTBot
Disallow: /

2) Blocking Anthropic (Claude)
Anthropic’s crawler is identified as anthropic-ai.

Syntax:

User-agent: anthropic-ai
Disallow: /

Syntax:

User-agent: anthropic-ai
Disallow: /

3) Blocking Google’s AI Training (Gemini)
Google uses a specific toggle called Google-Extended. Blocking this ensures your content isn't used to improve Gemini or Vertex AI models.

Syntax:

User-agent: Google-Extended
Disallow: /

Syntax:

User-agent: Google-Extended
Disallow: /

4) Blocking Common Scrapers (CCBot)
CCBot is the crawler for Common Crawl, a massive data repository often used by various smaller AI companies to train their models.

Syntax:

User-agent: CCBot
Disallow: /

Syntax:

User-agent: CCBot
Disallow: /

OVERALL SYNTAX:

# Step 1: Rules for all Search Engines
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /xmlrpc.php
Allow: /wp-admin/admin-ajax.php

# Step 2: Pages to keep out of search results
Disallow: /search/
Disallow: /author/
Disallow: /tag/

# Step 3: Block specific AI Scrapers (Claude, GPT, Gemini)
User-agent: GPTBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

# Step 4: Global Sitemap Location
Sitemap: https://wpvoid.com/sitemap_index.xml

OVERALL SYNTAX:

# Step 1: Rules for all Search Engines
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /xmlrpc.php
Allow: /wp-admin/admin-ajax.php

# Step 2: Pages to keep out of search results
Disallow: /search/
Disallow: /author/
Disallow: /tag/

# Step 3: Block specific AI Scrapers (Claude, GPT, Gemini)
User-agent: GPTBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

# Step 4: Global Sitemap Location
Sitemap: https://wpvoid.com/sitemap_index.xml

FAQs of a Robot.txt File

Does robots.txt hide my page from Google?

No. It tells Google not to crawl the page. If another website links to your page, Google may still index the URL and show it in search results without a description. To truly hide a page, use the noindex tag.

Where should I put the robots.txt file?

It must be in the root directory. If your site is wpvoid.com, the file must be at wpvoid.com/robots.txt. If you put it in a subdirectory like wpvoid.com/assets/robots.txt, crawlers will never find it.

Why does Google ignore my "Crawl-delay" command?

Google believes its algorithms are better at determining your server's capacity than a static number. To change Google's crawl rate, you must use settings within Google Search Console.

What does "User-agent: * Disallow: /" mean?

This command tells all web crawlers (e.g., Googlebot, Bingbot) not to crawl or index any part of the website.

Can I have more than one robots.txt file?

No. Each domain or subdomain can only have one. If you have blog.mysite.com and mysite.com, they each need their own separate file in their respective root directories.