Free SEO Tool

Robots.txt Generator with AI Bot Controls

Build a valid robots.txt file in seconds. Control exactly which AI crawlers, training bots, and search engines can access your site - no coding needed.

100% Free No Sign-up Instant Results
AI Answer Engines
AI Training Crawlers
Search Engines
Global Disallow Paths (optional)
🚦

Set your bot permissions and click Generate

robots.txt â€" upload to your site root
robots.txt

      

Upload as a plain text file to https://yourdomain.com/robots.txt. Validate at search.google.com/search-console â†' Settings â†' robots.txt Tester.

What Is robots.txt?

robots.txt in the Age of AI Crawlers

robots.txt is a plain text file at the root of your website (/robots.txt) that tells crawlers which pages and directories they are allowed or forbidden to access. Every major search engine respects it - and so do most AI crawlers, including GPTBot, ClaudeBot, and PerplexityBot.

In 2023 and 2024, the list of bots crawling the web expanded dramatically as AI companies began training and serving large language models. Each has its own User-agent string. This tool lets you configure permissions for each bot individually, so you can allow AI answer engines (which bring referral traffic) while blocking AI training crawlers (which scrape content without attribution).

  • Place the file at https://yourdomain.com/robots.txt - it must be in the site root
  • Disallow: / under a User-agent blocks that bot from your entire site
  • Blocking GPTBot prevents ChatGPT from citing your pages in answers - consider carefully before blocking
  • CCBot (Common Crawl) feeds many LLM training datasets - block it if you do not want your content used for training
  • robots.txt is advisory, not enforced. Malicious bots ignore it, but all reputable crawlers honour it

How to Use

How to Generate and Deploy Your robots.txt

  1. 1
    Set Permissions for Each Bot

    Toggle Allow or Block for each crawler. AI answer engines, AI training bots, and search engines are grouped separately for clarity.

  2. 2
    Add Global Disallow Paths (Optional)

    Block specific directories from all crawlers - common choices are /wp-admin/, /cart/, /checkout/, and /thank-you/.

  3. 3
    Generate, Copy, and Deploy

    Click "Generate robots.txt", copy the output, and upload it as a plain text file to your site root. Test it in Google Search Console → Settings → robots.txt Tester.

Learn More

The AI Crawler Landscape: Who's Crawling Your Site

AI answer crawlers vs. AI training crawlers

There are two distinct categories of AI bot. Answer crawlers (GPTBot, ClaudeBot, PerplexityBot) fetch pages to answer user queries in real time - blocking them means your content will not be cited in AI answers. Training crawlers (CCBot, Bytespider, DiffBot) harvest content to train language models - they may use your content without attribution. Most site owners allow the first group and block the second.

GPTBot and the citation trade-off

Many site owners blocked GPTBot in 2023 as a precaution. The trade-off: blocking GPTBot removes you from ChatGPT's browsing index, meaning users asking ChatGPT questions in your niche will never see your site as a source. For most content sites, being cited by ChatGPT is more valuable than preventing scraping. Allowing GPTBot but blocking CCBot is usually the better call.

Combining robots.txt with llms.txt

robots.txt controls access. llms.txt controls framing. A well-configured site uses both: robots.txt to allow the right AI crawlers and block training scrapers, and llms.txt to tell allowed crawlers which pages to prioritise and how to describe the site. They are complementary - configure robots.txt first, then deploy llms.txt.

FAQs

Robots.txt FAQs

Does blocking a bot in robots.txt actually work?
For reputable crawlers - yes. Google, Bing, OpenAI, Anthropic, and Perplexity all honour robots.txt. Scrapers and malicious bots may not. robots.txt is a directive, not an enforcement mechanism.
What happens if I block GPTBot?
ChatGPT will not crawl or index your pages, meaning your site will not appear as a source in ChatGPT answers or be included in OpenAI's training data refreshes that use live browsing.
Should I block CCBot?
If you do not want your content used to train LLMs without attribution, blocking CCBot is the most widely recommended action. CCBot is operated by Common Crawl, which supplies training data to many open and commercial LLMs.
Can I have multiple User-agent rules?
Yes. Each User-agent block applies only to that bot. You can allow Googlebot access everywhere while blocking CCBot from the entire site in the same file.
Do I need robots.txt if I use Yoast or Rank Math?
Both plugins generate a virtual robots.txt automatically. If you want custom AI bot rules, you'll need to either add them through the plugin's robots.txt editor or upload a physical robots.txt file to override the virtual one.

Is Your Site Visible to AI & Search?

Minineo scans your entire website for search and AI visibility. See how Google, ChatGPT, and Gemini are finding your pages and fix what’s holding you back.

Scan My Site Free →