Turn backlink indexation from a guessing game into a repeatable process. Start now
GUIDE / SEARCH CONSOLE & TOOLS

Check Pages Indexed by Google – Complete Site Audit Tool

Stop guessing which pages Google actually sees. This guide walks you through how to check pages indexed by Google using Search Console, the site: operator, and specialized crawlers, then shows you exactly how to diagnose and fix blocked, dropped, or orphaned URLs.

On this page
Field notes

Why Checking Indexed Pages Is the First Real Audit Step

Every SEO audit that matters starts with one question: what pages does Google actually have in its index? Not what you think it has, not what your sitemap lists, not what your CMS reports. The real index.

In practice, when you run a fresh index check on a mid-size site (say 2,000 pages), it is common to see 200 to 400 pages simply missing. No errors. No warnings. Just silent exclusion. Those pages cost you crawl budget, dilute topical authority, and in worst cases, cause duplicate content issues that drag down ranking pages.

This article is a direct, opinionated workflow to check pages indexed by Google, pinpoint what fell out, and fix it before your next core update hits.

Data table

Three Ways to Check Pages Indexed by Google – Comparison

MethodHow It WorksBest ForHidden Risk / Failure Mode
Google Search Console - URL Inspection & Coverage reportSubmit a single URL or pull the 'Indexed' count from the Pages report. Data is from Google's index, not a crawl.Daily monitoring, bulk URL checks, error classification (excluded, crawled, not indexed).Coverage report can show 100% indexed even when 15% of pages return soft 404s. GSC only reports what it tried to index.
site: operator - site:example.comReturns a sample of indexed URLs. Google explicitly says it is not exhaustive.Quick sanity check, competitor index size estimation, spotting hacked or spammy indexed pages.Results are heavily sampled. For a site with 10k indexed pages, site: may show only 300. Misleads beginners into thinking index is empty.
Third-party crawlers (Screaming Frog, Sitebulb, custom API)Crawl your site then cross-reference crawled URLs with GSC API or live index checks via URL Inspection API.Agencies, bulk audits, automated workflows, comparing crawl coverage vs indexed coverage.API quota limits (200 URLs/day free tier). Rate limiting can break overnight runs. Requires GSC verification for each property.
Workflow map

The Index Audit Workflow – From Crawl to Fix

Crawl Your Site

Use Screaming Frog or Sitebulb. Set to 5 threads, 1s delay. Export all internal URLs (incl. noindex).

Pull GSC Index Data

Export the 'Pages' report from Google Search Console. Filter by 'Indexed', 'Not indexed', 'Error'.

Cross-Reference Lists

Match your crawl list against GSC indexed list using a VLOOKUP or Python script. Flag URLs missing from GSC.

Classify Missing URLs

For each missing URL, check: noindex tag? robots.txt disallow? server error? orphan (no internal links)?

Fix Blocked or Weak Pages

Remove noindex on valuable pages. Update robots.txt. Add internal links. Resubmit via URL Inspection API.

Monitor Reindexing

After 7-14 days, re-run the workflow. Compare the new indexed count. Expect 80-90% recovery within two Googlebot cycles.

Data table

Diagnostic Table – Why Pages Fail to Index and How to Fix Them

Indexing Failure ModeDetection MethodImmediate FixLong-Term Prevention
noindex tag present in HTML or HTTP headerCheck page source or use Screaming Frog filter: 'Indexability: noindex'Remove the noindex meta tag or header. Re-submit URL via GSC URL Inspection > Request Indexing.Add a content governance rule: noindex only on thin pages, not on product pages. Audit quarterly.
robots.txt disallowCheck robots.txt for 'Disallow: /path/'. Use GSC robots.txt tester.Rewrite robots.txt to allow the blocked path. Wait 24h for cache to update. Request indexing.Never block pages you want indexed. Use robots.txt only for admin, staging, or duplicate endpoints.
Soft 404 (page loads but says 'not found')GSC Pages report shows 'Submitted URL seems to be a soft 404'. Also check GA bounce rate > 90%.Return a proper 404 or 301 to a relevant page. Add structured data to distinguish empty results from real pages.Automate pre-launch checks with a tool that validates HTTP status code AND page content for 'not found' patterns.
Orphaned pages (no internal links pointing to them)Compare your sitemap or crawl list against your internal link graph. Orphans appear in crawl but not in link count.Add contextual internal links from at least two relevant pages. Update your sitemap.Run a monthly orphan detection script. For large sites, use a tool that maps link distance from homepage.
Duplicate without canonicalScreaming Frog 'Canonical' column shows blank or self-referencing only. GSC shows 'Duplicate without canonical'.Set a rel=canonical tag pointing to the preferred version. Consolidate similar pages into one.Adopt a strict URL structure policy. Use hreflang only on truly international pages, not on region variants.
Field notes

Using the GSC Index Coverage Report the Right Way

The Google Search Console Pages report is your single most reliable source for Google's own index status. But most people misuse it. They look at the green 'Valid' count and think they are done. That is dangerous.

A common situation we see: the Coverage report shows 1,200 indexed pages. But the sitemap has 1,800 URLs. The difference is 600 pages that Google either has not crawled, could not crawl, or chose not to index. You need to click into each error category: 'Submitted URL not found (404)', 'Crawled but not indexed', 'Discovered but not crawled'. Each category requires a different fix. Do not just dismiss 'Crawled but not indexed' as normal. It often points to content quality issues.

For a deeper technical dive, Google's official documentation on sitemap best practices explains how proper sitemap submission helps Google discover your most important pages. Use it to structure your sitemap by priority, not by URL length.

Worked example

Worked Example: Audit of a 5,000-Page E-Commerce Site

Scenario: An e-commerce client with 5,000 product pages. Organic traffic dropped 40% after a site migration from HTTP to HTTPS. We needed to check pages indexed by Google to find the cause.

Step 1: Crawled the live site with Screaming Frog. Found 5,120 internal URLs (product pages + categories + blog). Exported list.

Step 2: Pulled the GSC Pages report. 'Indexed' count was 3,410. That is a 33% gap: 1,710 pages missing.

Step 3: Cross-referenced using a VLOOKUP in Google Sheets. 980 of the missing pages returned a 404 (old HTTP URLs still in the sitemap). 430 had a noindex tag accidentally inherited from the staging environment. 300 were orphaned – no internal links pointing to them after the migration.

Step 4: Fixed the 404s with 301 redirects to the new HTTPS versions. Removed noindex from the 430 product pages. Added contextual internal links from category pages to the 300 orphans.

Step 5: Submitted the updated sitemap and used the GSC URL Inspection API to request indexing for the 730 high-priority pages.

Result: After 3 weeks, indexed count rose to 4,850. Organic traffic recovered to 92% of pre-migration levels. The remaining 150 pages were thin content (duplicate descriptions) – required content rewrite.

Field notes

Crawl Buckets, API Limits, and Other Operational Realities

When you routinely check pages indexed by Google, you hit real-world constraints. The GSC API has a daily limit of 2,000 URL inspection requests for verified properties, and only 200 URLs per day on the free plan of most third-party tools. That means for a site with 10,000 pages, a full index audit takes 5 to 50 days depending on your quota. Plan ahead.

Another reality: the site: operator lies. Not intentionally, but by design. It returns a sample. If your site has 100k indexed pages, site: might show 1,500. That is not an indexation problem. It is a sampling problem. Never base a diagnosis on site: counts alone.

Slow vendors are another bottleneck. Some third-party tools cache GSC data for 24 to 48 hours. If you fix a noindex tag today, the tool might still report it as blocked tomorrow. Always verify directly in GSC before declaring victory.

Index Audit Checklist – Before You Start

1

Verify Google Search Console ownership for the exact domain (including www vs non-www, HTTP vs HTTPS).

2

Export your sitemap as a flat list of URLs from your CMS or from GSC Sitemaps report.

3

Run a full crawl with a tool that respects robots.txt and logs HTTP status codes.

4

Set up a cross-reference spreadsheet or script: column A = crawled URLs, column B = GSC indexed URLs, column C = match/missing.

5

Identify your target indexation rate: for content sites, aim for 95%+ of submitted URLs indexed.

6

Allocate 3-4 hours for the first audit on a site up to 5,000 pages.

7

If using the GSC API, check your quota before starting. Spread large audits across multiple days.

FAQ – How to Check Pages Indexed by Google (and Fix Issues)

How to check pages indexed by Google for a 10,000-page site without hitting API limits?

Do not rely on the free URL Inspection API alone. Export the GSC Pages report (it lists indexed and excluded URLs in bulk). Cross-reference with your crawl list. For the URLs that are not indexed, use the URL Inspection API only for those specific pages. This reduces API calls from 10,000 to a few hundred.

How to check pages indexed by Google using Screaming Frog and GSC together?

Crawl your site with Screaming Frog, export all internal URLs. Then in GSC, go to Pages > Export > 'Full table'. Open both CSVs in Excel. Use VLOOKUP or XLOOKUP to match URLs. Any URL in the crawl that does not appear in the GSC indexed list needs investigation. This is the gold standard for a manual audit.

How to check pages indexed by Google for a client's site if I only have view access in GSC?

View access is enough. You can still export the Pages report and use the URL Inspection tool for individual checks. You cannot submit indexing requests or update sitemaps, but you can diagnose the problem. Document the gaps and ask the owner to fix noindex tags or redirects. For bulk checks, use a third-party tool that connects via GSC API with the owner's token.

How to check pages indexed by Google after a site migration, step by step?

Step 1: Crawl the new site. Step 2: Compare the URL list against the old site's URL list (get it from the Wayback Machine or an old crawl). Step 3: In GSC, check the 'Indexed' count for the new property. Step 4: Look for 404s in the old sitemap that are still being requested. Step 5: Set up 301 redirects from old URLs to new ones. Step 6: Submit the new sitemap. Step 7: Monitor the Coverage report daily for two weeks.

How to check pages indexed by Google when GSC shows 'Crawled but not indexed' for 500+ pages?

That error means Google found the page but chose not to index it. Common causes: thin content, low word count, duplicate content, or low perceived relevance. Pick a sample of 20 pages. Check word count, page title uniqueness, and whether the page adds value beyond the category page. If most are thin, rewrite or consolidate them. If they are high-quality but still not indexed, request indexing via the URL Inspection tool for a batch of 50 per day.

How to check pages indexed by Google for a site that has a large number of orphan pages?

Orphan pages cannot be discovered by crawling from your homepage. To find them, use your sitemap as the starting URL list. Crawl the sitemap URLs and compare them with the crawl of your entire site (starting from the homepage). Any sitemap URL not found in the full crawl is an orphan. Check if those orphans are indexed via GSC. If not, add internal links from relevant pages.

How to check pages indexed by Google using an API for automated reporting?

Use the Google Search Console API (Indexing API for job posting/video, or URL Inspection API for standard pages). Write a script in Python or Node.js that reads a list of URLs from a CSV, sends each to the URL Inspection API, and logs the 'inspectionResult.indexStatusResult.verdict'. Store results in a database. Set up a weekly cron job. Be mindful of the 2,000 queries per day limit per property.

How to check pages indexed by Google for guest posts or backlinks to ensure they are indexed?

For each guest post URL, run it through the GSC URL Inspection tool. If it shows 'URL is not on Google', the page may have a noindex tag or the site has low authority. Request indexing if you have access. If you do not, ask the site owner to remove any noindex and to add internal links from their homepage or blog hub. Monitor weekly until the URL appears in the index.

How to check pages indexed by Google when the site: operator returns only 10 results but I know there are 1,000 pages?

The site: operator is not a reliable count. It returns a sample, often heavily filtered. It does not mean your pages are deindexed. Instead, use GSC Pages report for an accurate count. If GSC also shows low numbers, run a crawl to see if the pages are blocked by robots.txt, return 404s, or have noindex tags. Do not trust site: for anything beyond a quick sniff test.

How to check pages indexed by Google for a site with multiple subdomains?

Each subdomain (blog.example.com, shop.example.com) is treated as a separate property in GSC. You must verify each one. Run the index check separately per subdomain. To see the full picture, export each property's Pages report and combine them. Watch out for cross-subdomain duplication – a product page on shop.example.com and a blog post about the same product on blog.example.com can compete for the same query.

Field notes

Putting It All Together – Your Index Audit Cadence

Checking pages indexed by Google is not a one-time task. It should be part of your monthly SEO operations. After you fix the initial gaps, set up a monitoring system: weekly GSC coverage export, monthly crawl-to-index comparison, and an alert when the indexed count drops by more than 5% in a week.

For a more structured approach, use the Google Index Update Detection Checklist to correlate index drops with algorithm updates. This helps you distinguish between a technical failure (e.g., a bad robots.txt change) and a Google quality update that affected your pages' inclusion.

Keep your sitemap clean, your internal links intentional, and your noindex tags rare. That is the foundation. The audit is just the verification.

Next reads

Related guides

Budget math

Estimate the cost of waiting

Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.