Who Actually Reads Your Site
Before a human ever finds you, a bot decides whether you exist. In 2026 that's no longer just Googlebot — it's a fleet of AI crawlers feeding ChatGPT, Perplexity, Gemini and Claude. Block them and you vanish from AI answers. Welcome them, and every crawl becomes a chance to be cited.
Crawl
A bot reads your page and stores what it understands about your entity.
Understand
Schema + clear content tell it you are an independent Insurance Agency in Cebu.
Cite
When a user asks an AI, it names and links you — visibility without a classic click.
The crawlers that matter in 2026
Most should be welcomed — each one is a distribution channel for your brand entity.
GooglebotGoogle Search
Classic indexing — powers blue links, the map pack and AI Overviews sourcing.
Google-ExtendedGoogle / Gemini
Opt-in signal for Gemini & AI Overviews model use. Allowing it keeps you eligible for AI answers.
OAI-SearchBotOpenAI (ChatGPT Search)
Indexes pages so ChatGPT Search can surface and link to you.
GPTBotOpenAI (training)
Collects pages for model training. Allowing it grows brand presence inside the model itself.
ChatGPT-UserOpenAI (live fetch)
Fetches a page in real time when a user asks ChatGPT about it.
PerplexityBotPerplexity
Indexes and cites sources in Perplexity answers — a fast-growing referral source.
ClaudeBotAnthropic (Claude)
Crawls for Claude’s knowledge and web answers.
BingbotMicrosoft / Copilot
Powers Bing and Copilot — the second search ecosystem and an AI answer engine.
ApplebotApple (Siri / Spotlight)
Feeds Siri suggestions and Spotlight.
BytespiderByteDance / TikTok
Aggressive crawler for ByteDance AI. Often throttled — allow selectively if server load is a concern.
robots.txt
The front door. It tells every bot what it may crawl. The sovereign default: open to search and AI engines, with a clear pointer to your sitemap.
# robots.txt — cases.skaly.tech style: open to search + AI, point to sitemap & llms.txt
User-agent: *
Allow: /
# Search + AI answer engines (explicitly welcomed)
User-agent: Googlebot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: GPTBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /
Sitemap: https://x-consultant.com/sitemap.xmlllms.txt
The AI-era addition. A plain-language brief that tells language models who you are, your key pages and how to describe you — so they cite you correctly, not as a reseller.
# llms.txt — tells AI engines who you are and what to cite
# x-consultant — Insurance Agency in Cebu City, Philippines
> Independent insurance agency helping families, nurses and BPO workers
> in Cebu find the right HMO, life and health coverage. Free consultations.
## Key pages
- [HMO for nurses in Cebu](https://x-consultant.com/hmo-nurses-cebu): plans, claims, no deposit
- [Insurance advisor Cebu](https://x-consultant.com/advisor-cebu): book a free consultation
- [About](https://x-consultant.com/about): credentials, experience, licensing
## Entity
- Type: Insurance Agency
- Area served: Cebu City, Philippines
- Contact: +63 ... · hello@x-consultant.comBlocking AI crawlers is opting out of the future SERP
Some brands block GPTBot and Google-Extended to “protect” content. For a local service business that's backwards: you wantthe model to know you exist, what you do and where. The agencies that get named in “best insurance advisor in Cebu” AI answers are the ones that let the crawlers in — and gave them clean structured data to read.