Technical — AI Crawlers

Who Actually Reads Your Site

Before a human ever finds you, a bot decides whether you exist. In 2026 that's no longer just Googlebot — it's a fleet of AI crawlers feeding ChatGPT, Perplexity, Gemini and Claude. Block them and you vanish from AI answers. Welcome them, and every crawl becomes a chance to be cited.

STEP 1

Crawl

A bot reads your page and stores what it understands about your entity.

STEP 2

Understand

Schema + clear content tell it you are an independent Insurance Agency in Cebu.

STEP 3

Cite

When a user asks an AI, it names and links you — visibility without a classic click.

The crawlers that matter in 2026

Most should be welcomed — each one is a distribution channel for your brand entity.

Googlebot

Google Search

Classic indexing — powers blue links, the map pack and AI Overviews sourcing.

ALLOWNever block.

Google-Extended

Google / Gemini

Opt-in signal for Gemini & AI Overviews model use. Allowing it keeps you eligible for AI answers.

ALLOWAllow for visibility.

OAI-SearchBot

OpenAI (ChatGPT Search)

Indexes pages so ChatGPT Search can surface and link to you.

ALLOWAllow — drives citations.

GPTBot

OpenAI (training)

Collects pages for model training. Allowing it grows brand presence inside the model itself.

ALLOWAllow for reach.

ChatGPT-User

OpenAI (live fetch)

Fetches a page in real time when a user asks ChatGPT about it.

ALLOWAllow.

PerplexityBot

Perplexity

Indexes and cites sources in Perplexity answers — a fast-growing referral source.

ALLOWAllow.

ClaudeBot

Anthropic (Claude)

Crawls for Claude’s knowledge and web answers.

ALLOWAllow.

Bingbot

Microsoft / Copilot

Powers Bing and Copilot — the second search ecosystem and an AI answer engine.

ALLOWAllow.

Applebot

Apple (Siri / Spotlight)

Feeds Siri suggestions and Spotlight.

ALLOWAllow.

Bytespider

ByteDance / TikTok

Aggressive crawler for ByteDance AI. Often throttled — allow selectively if server load is a concern.

OPTIONALOptional — rate-limit if needed.

robots.txt

The front door. It tells every bot what it may crawl. The sovereign default: open to search and AI engines, with a clear pointer to your sitemap.

robots.txt

# robots.txt — cases.skaly.tech style: open to search + AI, point to sitemap & llms.txt

User-agent: *
Allow: /

# Search + AI answer engines (explicitly welcomed)
User-agent: Googlebot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: GPTBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /

Sitemap: https://x-consultant.com/sitemap.xml

llms.txt

The AI-era addition. A plain-language brief that tells language models who you are, your key pages and how to describe you — so they cite you correctly, not as a reseller.

llms.txt

# llms.txt — tells AI engines who you are and what to cite

# x-consultant — Insurance Agency in Cebu City, Philippines
> Independent insurance agency helping families, nurses and BPO workers
> in Cebu find the right HMO, life and health coverage. Free consultations.

## Key pages
- [HMO for nurses in Cebu](https://x-consultant.com/hmo-nurses-cebu): plans, claims, no deposit
- [Insurance advisor Cebu](https://x-consultant.com/advisor-cebu): book a free consultation
- [About](https://x-consultant.com/about): credentials, experience, licensing

## Entity
- Type: Insurance Agency
- Area served: Cebu City, Philippines
- Contact: +63 ... · hello@x-consultant.com

Blocking AI crawlers is opting out of the future SERP

Some brands block GPTBot and Google-Extended to “protect” content. For a local service business that's backwards: you wantthe model to know you exist, what you do and where. The agencies that get named in “best insurance advisor in Cebu” AI answers are the ones that let the crawlers in — and gave them clean structured data to read.

Structured Data →Ranking Algorithms →Run an Audit →