Duplicate content is any substantial block of content that appears at more than one URL — either within your own site or across the web. Google doesn't penalise duplicate content directly, but it does have to choose one canonical version to index and rank. If it picks the wrong one, or splits its signals across multiple versions, your rankings suffer.
The most common sources of duplicate content
- HTTP vs. HTTPS versions both accessible (e.g. http://example.com and https://example.com returning the same page)
- WWW vs. non-WWW variants not redirected to a single canonical version
- URL parameters creating duplicate pages (/products?sort=asc, /products?sort=desc, /products all showing the same content)
- Pagination duplicates — page 1 content leaking onto /page/2 via shared introductory sections
- Printer-friendly or AMP versions without proper canonical tags
- Scraped content — third parties copying your pages (you're the victim, but you still lose the signal)
- Category/tag archive pages that duplicate post content with no unique value
How to audit for duplicate content
Run a full site crawl with Screaming Frog (free up to 500 URLs) or Sitebulb. Filter the results by 'Duplicate Page Titles', 'Duplicate H1s', and 'Duplicate Meta Descriptions' first — these are the fastest proxy signals. Then check the 'Duplicate Content' tab, which clusters URLs with near-identical body content.
✦ Insight
For parameter-driven duplication on large e-commerce or SaaS sites, check GSC → Index → Pages and filter for 'Duplicate without user-selected canonical' and 'Duplicate, Google chose different canonical than user'. These two statuses tell you exactly where Google is ignoring your canonical instructions.
Fix 1: 301 redirects for structural duplicates
For HTTP/HTTPS and WWW/non-WWW duplicates, a 301 redirect is the correct fix — not a canonical tag. A canonical is a hint; a redirect is a command. Implement 301 redirects at the server or CDN level to funnel all variants to a single preferred URL. This is also required for any legacy URL migrations.
# nginx: redirect HTTP to HTTPS and non-www to www
server {
listen 80;
server_name example.com www.example.com;
return 301 https://www.example.com$request_uri;
}
# Next.js: redirects in next.config.js
redirects: async () => [
{ source: '/:path*', has: [{ type: 'host', value: 'example.com' }],
destination: 'https://www.example.com/:path*', permanent: true },
]Fix 2: Canonical tags for parameter-driven duplicates
For URL parameters that generate near-duplicate pages (sort, filter, session IDs), add a self-referencing canonical tag on every parameterised page pointing back to the clean base URL. This tells Google to consolidate all ranking signals on the canonical version.
<!-- On /products?sort=price&color=red, canonical points to clean URL -->
<link rel="canonical" href="https://example.com/products" />
<!-- On /products itself, canonical is self-referencing -->
<link rel="canonical" href="https://example.com/products" />Fix 3: Consolidate thin or near-duplicate pages
If you have multiple pages on very similar topics (e.g. 'Best CRM for startups', 'Best CRM for small business', 'Best CRM for SaaS companies') that each rank poorly, consider merging them into one comprehensive page and 301-redirecting the others. A single authoritative page beats three mediocre ones in Google's eyes.
⚠️ Warning
Don't noindex your way out of duplicate content problems at scale. Noindexed pages still get crawled, still consume crawl budget, and don't pass link equity. For pages you genuinely want to remove from Google's consideration, consolidate and redirect rather than noindex.
💡 Tip
Chapter 1 of SEOdisaster includes a canonical tag crisis scenario — a platform migration that created thousands of duplicate URLs overnight. Work through the triage in the game to build the pattern recognition you need for real audits.