Search Lighthouse

Home How It Works Methodology Pricing Case Studies Resources Log in

Technical SEO

How to check robots.txt, sitemap, and canonical tags

Robots, sitemap, and canonical tags tell search engines what they can crawl and which URLs matter.

Problem

Small configuration mistakes can block indexing or send contradictory signals.

Symptoms

Important pages are missing from search.
Sitemap discovery fails.
Canonical tags point to unexpected URLs.

How to diagnose

Fetch robots.txt.
Find sitemap URLs from robots and common paths.
Inspect canonical tags on homepage and sampled pages.

How to fix

Allow important paths in robots.txt.
Submit a clean sitemap.
Use one canonical URL per page and avoid self-contradictory metadata.

How Search Lighthouse helps

Search Lighthouse runs these checks together so a report shows whether the issue is robots, sitemap, canonical, or page metadata.

Related guides

How to fix canonical and sitemap host mismatch

Host mismatch happens when your sitemap, canonical tags, and live URLs disagree about the preferred domain.

Why Google crawled but did not index your pages

Crawled but not indexed usually means discovery worked, but page quality, duplication, or signals did not justify indexing.

How to improve indexability for AI-built websites

AI-built websites often ship fast, but search engines still need stable templates, links, and unique page value.