robots.txt is a plain-text file placed at the root of your domain (e.g., yourdomain.com/robots.txt) that tells web crawlers which pages they're allowed to access. A single misconfiguration — a Disallow: / left over from development — can make your entire site invisible to Google. Understanding robots.txt syntax isn't optional for anyone managing an SEO-dependent website.
How robots.txt works
When Googlebot visits your site, it first requests yourdomain.com/robots.txt. If the file exists, it reads the directives and determines which paths it's allowed to crawl. If the file doesn't exist, Googlebot treats it as no restrictions — full crawl access. Robots.txt directives are voluntary — malicious crawlers ignore them entirely. Robots.txt only affects legitimate, well-behaved crawlers like Googlebot, Bingbot, and others that respect the standard.
⚠️ Warning
Robots.txt controls crawling, not indexing. A page blocked by robots.txt can still appear in Google's index if other pages link to it — Google knows it exists from the links even if it can't crawl it. To prevent indexing, use a noindex meta tag (which requires the page to be crawlable). Robots.txt and noindex serve different purposes.
Basic robots.txt syntax
# robots.txt — basic structure
# Apply rules to all crawlers
User-agent: *
Disallow: /admin/ # Block the admin section
Disallow: /private/ # Block private files
Allow: / # Allow everything else
# Apply different rules to Googlebot specifically
User-agent: Googlebot
Disallow: /internal-tools/
# Point crawlers to your sitemap
Sitemap: https://yourdomain.com/sitemap.xmlThe most common robots.txt directives
- User-agent: * — applies the following rules to all crawlers
- User-agent: Googlebot — applies rules only to Google's crawler
- Disallow: /path/ — blocks crawlers from accessing this path and all sub-paths
- Allow: /path/ — explicitly permits a path that might otherwise be blocked by a broader Disallow rule
- Sitemap: URL — points crawlers to your XML sitemap location
- Crawl-delay: 10 — asks crawlers to wait 10 seconds between requests (note: Googlebot ignores this; use GSC crawl rate settings instead)
Critical robots.txt mistakes
Mistake 1: Blocking the entire site
The most catastrophic robots.txt error. Set during development to prevent Google from indexing an unfinished site, then never removed at launch. Result: zero indexed pages, zero organic traffic.
# DANGEROUS — blocks all crawlers from everything
User-agent: *
Disallow: /
# CORRECT — allows full crawl access
User-agent: *
Allow: /Mistake 2: Blocking CSS and JavaScript
Blocking /wp-content/ or static asset directories prevents Googlebot from rendering your pages correctly. If Google can't load your CSS and JavaScript, it sees a broken, unrendered version of your site — which can hurt rankings significantly.
Mistake 3: Using robots.txt instead of noindex for sensitive pages
If you want a page to not appear in Google's index, blocking it in robots.txt doesn't guarantee that. Google may still list the URL in search results if other sites link to it — it just can't read the content. Use noindex meta tags for pages that must not appear in search results.
How to test your robots.txt
Google Search Console provides a robots.txt tester under Settings → robots.txt. Enter any URL on your site and it will tell you whether Googlebot can crawl it based on your current rules. Always test before deploying changes to robots.txt — a typo in a path can block thousands of pages.
- Check GSC Settings → robots.txt to view and test your current file
- Test every critical URL type: homepage, product pages, blog posts, sitemap
- After any robots.txt change, submit the updated file via the GSC robots.txt report
- Monitor GSC → Coverage report for spikes in 'Blocked by robots.txt' errors after changes
robots.txt best practices
- Always include a Sitemap: directive pointing to your XML sitemap
- Block admin, login, and internal tool paths from all crawlers
- Do not block CSS, JavaScript, or font files — Google needs them to render your pages
- Use the GSC robots.txt tester before deploying any change to production
- Remove development Disallow: / rules before launch — set a deployment checklist item
- Keep the file simple — complex robots.txt files with many conflicting rules cause unpredictable behavior
💡 Tip
Practice this in the game: Chapter 1-1 (The Silent Launch) puts you in the middle of a Disallow: / disaster — a 2,000-product e-commerce store invisible to Google because of one line in robots.txt.