AI Crawlers: GPTBot, ClaudeBot, PerplexityBot and more

AI crawlers are bots operated by LLM providers to discover and index pages for training or retrieval. Decide which to allow and which to block.

2026-06-19
·
1 min read

AI Crawlers

AI crawlers are bots operated by LLM providers to discover and index pages on the web. Some crawl to train future models, some crawl to power live retrieval in chat answers, and some do both. Understanding who crawls you—and why—is a core part of AI-era SEO.

The major AI crawlers

  • GPTBot — OpenAI’s crawler. Used for training and for ChatGPT search retrieval
  • OAI-SearchBot — OpenAI’s search-only crawler, separate from training
  • ClaudeBot — Anthropic’s crawler, used to keep Claude’s knowledge current
  • PerplexityBot — Perplexity’s crawler, used primarily for live answers
  • Google-Extended — Google’s opt-out token that controls whether your content is used for Gemini training
  • Applebot-Extended — Apple’s crawler for Apple Intelligence features
  • Meta-ExternalAgent / Meta-ExternalFetcher — Meta’s crawlers for its AI products
  • Bytespider — ByteDance’s crawler for Doubao / TikTok AI

How to manage AI crawlers

You have three options for each crawler:

  1. Allow — let it index your content. This is the default and the right choice for most publishers who want AI traffic
  2. Block — use robots.txt to disallow the user agent. You will not appear in its training data or live answers
  3. Differentiate — use Google’s approach: a separate User-agent: Google-Extended token that controls Gemini training while leaving Search unaffected
  • Publishers who want AI traffic: allow all major crawlers by default. Use llms.txt to make it easy for them to find your best pages
  • Sites with paywalls or sensitive content: block selectively, and use structured data and schema to control what gets surfaced
  • Anyone: review your robots.txt quarterly. The list of AI crawlers grows every month

Privacy & Cookies

We use cookies to enhance your experience. By continuing to visit this site you agree to our use of cookies.