An XML sitemap is a file that lists the URLs on your site you want search engines to crawl and index. It doesn't guarantee indexation — Google treats it as a suggestion, not a command — but it's one of the most reliable ways to ensure your important pages are discovered and re-crawled after updates. A common misconception is that submitting a sitemap is a one-time setup task. In practice, it's an ongoing signal that needs to be maintained as your site grows.
What to include in your XML sitemap
Only include URLs you actively want Google to index. This sounds obvious, but most auto-generated sitemaps include pages they shouldn't: paginated results, parameterised filter URLs, thin tag archives, and pages with noindex tags. Including a noindexed page in your sitemap is a direct contradiction that confuses Googlebot and wastes crawl budget.
- Include: canonical versions of all indexable pages — posts, products, category pages, landing pages
- Include: pages updated frequently (Google uses sitemap lastmod to prioritise re-crawls)
- Exclude: URLs with noindex tags — never include a page in your sitemap that also has noindex
- Exclude: parameterised URLs (?sort=, ?filter=, ?page=) unless they're your canonical versions
- Exclude: redirect source URLs — only include the final destination
- Exclude: pages blocked by robots.txt — Googlebot won't crawl them anyway
⚠️ Warning
Having a URL in your sitemap that also has a noindex directive is one of the most common GSC errors. Google will report it as an 'Indexed, though blocked by robots.txt' or 'Excluded by noindex tag' error. Audit your sitemap against your noindex tags quarterly.
Sitemap format and the lastmod attribute
The lastmod attribute tells Google when the page was last substantively updated. Google uses this to decide when to re-crawl a page. If you set lastmod to today's date on every page regardless of actual changes, Google will quickly learn to ignore it — it becomes noise. Only update lastmod when the page content has genuinely changed.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/blog/technical-seo-guide</loc>
<lastmod>2026-04-10</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
<!-- Note: changefreq and priority are largely ignored by Google.
loc and lastmod are the only attributes worth maintaining. -->Sitemap index files for large sites
A single XML sitemap can contain a maximum of 50,000 URLs and must be under 50MB uncompressed. For larger sites, use a sitemap index file that references multiple child sitemaps — one per content type (posts, products, categories, images). This also makes it easier to monitor indexation by content type in Google Search Console.
<!-- sitemap-index.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-posts.xml</loc>
<lastmod>2026-04-17</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-products.xml</loc>
<lastmod>2026-04-17</lastmod>
</sitemap>
</sitemapindex>How to submit your sitemap to Google
There are two ways to submit: via Google Search Console (recommended) and via your robots.txt file. GSC submission gives you crawl data, error reports, and a submission history. The robots.txt method ensures Googlebot discovers the sitemap even without a GSC submission — use both.
- GSC: go to Sitemaps → enter your sitemap URL → Submit. Google will show status, last crawled date, and URL counts.
- robots.txt: add Sitemap: https://example.com/sitemap.xml at the bottom of your robots.txt file
- Ping Google: fetch https://www.google.com/ping?sitemap=https://example.com/sitemap.xml after major updates (deprecated but still functional as of 2026)
- Resubmit after major content restructures — a new sitemap submission signals that the site structure has changed
Diagnosing sitemap errors in GSC
After submission, the GSC Sitemaps report shows two numbers: URLs submitted vs. URLs indexed. A large gap between these is a signal worth investigating. Common causes: pages with low quality signals being deprioritised, noindex/sitemap conflicts, server errors during crawl, or canonical mismatches where Google is indexing a different version of the URL than the one in your sitemap.
💡 Tip
In Level 1 of SEOdisaster, one scenario involves a site migration where the new sitemap was never submitted and 40% of pages dropped from the index. You'll diagnose the gap using simulated GSC data and rebuild crawl coverage under a deadline.