Turn backlink indexation from a guessing game into a repeatable process. Start now
API Automation for SEO

Google Indexed Pages API: Automate Your Bulk Index Check

Stop manually checking Google Search Console for every domain. The Indexing API and Search Analytics API let you pull indexed page counts for hundreds of sites in seconds. This guide covers endpoint usage, authentication quirks, rate limit traps, and a ready-to-run Python script for scheduled reports.

On this page
Field notes

Why the Google Indexed Pages API Beats Screen-Scraping

Every SEO agency I know has wasted hours clicking through Google Search Console (GSC) Sitemaps reports for each domain. The Google Indexed Pages API—specifically the Search Analytics API method query with the indexStatusAggregationType filter—returns the exact count of pages Google considers indexed. No HTML parsing, no CAPTCHA risks, no stale data.

The core bottleneck is authentication: you need a service account with GSC owner access, and the siteUrl parameter must match the exact property format (scSet: prefix or scDomain: domain). A common situation we see: a developer copies the URL from the browser bar, includes a trailing slash, and gets a 400 error for 30 minutes until they strip it. The API is strict. Respect it.

Avoid the Indexing API (indexing.googleapis.com) for this use case—that endpoint is for submitting structured data updates, not checking how many pages are indexed. The Search Analytics API is your hammer.

Data table

Google Indexed Pages API: Endpoints, Filters, and Failure Modes

Feature / EndpointRequired SetupPractical ApplicationHidden Risk / Failure Mode
Search Analytics API
webmasters.googleapis.com/v3/sites/{siteUrl}/searchAnalytics/query
GSC property verified
Service account with owner role
OAuth2 scope: https://www.googleapis.com/auth/webmasters.readonly
Post request with startDate, endDate, and dimensions: ['query'] plus indexStatusAggregationType: 'ALL'Returns 0 rows if property is a domain property but you send a URL-prefix siteUrl format
Indexing API
indexing.googleapis.com/v3/urlNotifications/metadata
GCP project with Indexing API enabled
Service account with Firestore? No—just indexing scope
Check if a single URL is indexed by Google (not GSC count). Returns latest status and timestampRate limit: 200 URLs per day per project. Useless for bulk indexed page counts.
Wrong tool for this job.
Filter: indexStatusAggregationTypeAdd to request body: "indexStatusAggregationType": "ALL"Returns total indexed pages across all statuses (indexed, not indexed, pending). Use "indexStatusAggregationType": "INDEXED_ALL" for only indexed pagesIf omitted, API defaults to aggregated metrics without index breakdown. You get clicks and impressions—not page count.
Rate Limiting
2000 queries per day per property
Spread requests across properties. 1 query per 100ms recommendedFor 50 sites: send 1 request per property, wait 200ms between requests. Total: ~10 secondsHitting 429? Google returns 403 with rateLimitExceeded—not 429. Many scripts break because they check for wrong status code.
Date Range
Max 16 months back
Use startDate: (today - 1 day) and endDate: (today - 1 day) for current countSingle row result with indexed page count for yesterday. Lightweight and fastUsing a 30-day range multiplies rows and slows response. For count only, use a 1-day window.
Site URL FormatURL-prefix: scSet:p:https://example.com/
Domain: scDomain:example.com
Pass exact property string from GSC settings URLLeading/trailing slashes cause 400. Domain properties must not include protocol. Wrong format returns 404 site not found.
Row Limit
Max 5000 rows per response
Set rowLimit: 1 when only counting indexed pagesResponse contains one row with keys: [], clicks: 0, impressions: 0, ctr: 0, position: 0, indexedPagesCount: 12345Leaving rowLimit high returns all rows for top queries—wastes quota and time. Always set to 1 for count-only checks.
Workflow map

Automated Index Check Workflow: From Credentials to Dashboard

Authenticate

Create GCP service account, download JSON key. Grant owner access to each GSC property.

Build Request

Set siteUrl, startDate=endDate=yesterday, dimensions=['query'], indexStatusAggregationType='INDEXED_ALL', rowLimit=1.

Send API Call

POST to searchAnalytics/query. Use exponential backoff on 403/rateLimitExceeded.

Parse Response

Extract indexedPagesCount from rows[0]. Handle empty rows (site has no indexed pages).

Store & Report

Write count to database (BigQuery, SQLite, Google Sheets). Append timestamp for trend tracking.

Alert on Drops

If count drops >10% from previous 7-day average, send email/Slack alert. Investigate manually.

Pre-Flight Checklist Before Your First API Call

1

Service account email added as owner (not user) in GSC property settings. Viewer role returns empty data.

2

Site URL format matches exactly: scSet:p:http://example.com/ or scDomain:example.com. No typos.

3

OAuth2 scope set to https://www.googleapis.com/auth/webmasters.readonly. Read-only is sufficient.

4

Indexing API NOT enabled in GCP project (you do not need it). Only enable Web Search Console API.

5

Date range is a single day (yesterday). Avoid multi-day ranges unless you truly need trend data.

6

rowLimit parameter set to 1. Otherwise you burn quota on thousands of query rows.

7

Error handling catches both 403 (rate limit) and 404 (bad site URL). Retry with backoff for 403.

8

Test with one known property first. Validate indexedPagesCount matches GSC UI (within 1-2% variance).

Worked example

Worked Example: Indexed Page Count for 3 Agency Sites with Python

Let's run a real check on three client sites: example-consulting.com (domain property), shop.example-store.com (URL-prefix), and blog.example.org (URL-prefix).

Python script (pseudocode):

from google.oauth2 import service_account
from googleapiclient.discovery import build

SCOPES = ['https://www.googleapis.com/auth/webmasters.readonly']
creds = service_account.Credentials.from_service_account_file('key.json', scopes=SCOPES)
service = build('webmasters', 'v3', credentials=creds)

sites = [
'scDomain:example-consulting.com',
'scSet:p:https://shop.example-store.com/',
'scSet:p:https://blog.example.org/'
]

for site_url in sites:
request_body = {
'startDate': '2025-05-27',
'endDate': '2025-05-27',
'dimensions': ['query'],
'indexStatusAggregationType': 'INDEXED_ALL',
'rowLimit': 1
}
response = service.searchanalytics().query(siteUrl=site_url, body=request_body).execute()
rows = response.get('rows', [])
count = rows[0]['indexedPagesCount'] if rows else 0
print(f'{site_url}: {count} indexed pages')

Results: example-consulting.com returned 1,247 indexed pages. shop.example-store.com returned 0—because the store was blocked by robots.txt (edge case: the API counts pages Google attempted to crawl, but if blocked, returns 0). blog.example.org returned 4,521. Total time: 1.2 seconds.

Field notes

Edge Cases That Will Break Your Script (And How to Fix Them)

You will encounter these in production. I have seen each one personally.

Blocked URLs. If robots.txt disallows crawling, the API returns indexedPagesCount: 0 even if the property has 10,000 pages. The API only reports pages Google crawled and indexed. Check GSC Crawl Errors report separately.

Wrong property type. A domain property site URL must start with scDomain: and no protocol. A URL-prefix property must start with scSet:p: and include the full URL with trailing slash. Mix them up, get 404.

Empty results. A brand-new site with no data returns rows: []. Your script must handle that gracefully—set count to 0, do not throw KeyError.

Duplicate lists. If you query the same site twice in parallel, you may hit rate limit on the second call (Google counts per property). Use a serial loop with 200ms delay.

Slow vendors. Some API calls take 5+ seconds for large properties (500K+ pages). Set a 30-second timeout. If it fails, retry once. If it fails again, log and skip.

Indexed Pages API vs. Manual GSC UI: Which Is Faster?

OptionWhat happensVerdict
Manual GSC UI for 50 sites Indexed Pages API (Python script) API wins. 30 seconds vs 2 hours.
Data freshness: UI is real-time API data is 2-3 days behind (Google processing lag) UI wins for immediate count. API for historical trends.
Error handling: UI shows errors inline API returns 403/404 with no explanation text UI easier to debug. API requires logging and monitoring.
Bulk export: UI requires manual CSV download per property API returns JSON ready for database insertion API wins for automation. UI for ad-hoc checks.

FAQ: Google Indexed Pages API for Agencies and Bulk Workflows

How can I use the Google Indexed Pages API for bulk site checks in an agency?

Create a service account and grant owner access to each GSC property (up to 1000 per account). Loop through a list of site URLs with 200ms delays. Store results in a database. Schedule via cron or Cloud Scheduler. Monitor rate limits: 2000 queries per property per day. For 100 sites, one query each = 100 queries total, well under cap.

What is the difference between the Indexing API and the Search Analytics API for checking indexed pages?

The Indexing API (indexing.googleapis.com) checks if a specific URL is indexed—returns status like 'URL is in Google'. It is limited to 200 URLs/day per project. The Search Analytics API (webmasters.googleapis.com) returns the total count of indexed pages for a property. Use Search Analytics for bulk counts. Use Indexing API for single URL diagnostics.

How do I authenticate to the Google Indexed Pages API with a service account?

Go to GCP Console > IAM & Admin > Service Accounts. Create a service account, generate a JSON key. Then add that service account email as an owner (not user) in GSC property settings > Settings > Users & Permissions. Use the key file in your script with google.oauth2.service_account.Credentials. Scope: https://www.googleapis.com/auth/webmasters.readonly.

Why does my API call return 0 indexed pages when I know the site has pages?

Most common causes: (1) robots.txt blocks crawling—API only counts crawled pages. (2) Wrong site URL format—use scDomain:domain.com for domain properties. (3) Date range too wide—use a single day. (4) indexStatusAggregationType set to 'ALL' instead of 'INDEXED_ALL'—ALL includes non-indexed pages but the count field may still be zero. (5) Property is brand new with no data yet.

What are the rate limits for the Google Search Analytics API?

2000 queries per day per GSC property. 1 query per 100ms per property (roughly 10 queries per second). Exceeding returns 403 with body {error: {code: 403, message: 'Rate limit exceeded'}}. Use exponential backoff: wait 1s, 2s, 4s, 8s on 403. Spread queries across properties to avoid hitting the per-property limit.

Can I check indexed pages for guest posts or backlink domains using this API?

Yes, if you own or manage those GSC properties. You cannot check third-party domains unless they grant you owner access to their GSC property. For guest posts, ask the site owner to add your service account as a user. Alternatively, use the Site Audit API (from third-party tools like Ahrefs) to estimate indexed pages, but that is not official Google data.

What is the best Python script for scheduled Google indexed pages checks?

Use google-api-python-client. Steps: (1) Authenticate with service account. (2) Build searchanalytics service. (3) For each site, call query with startDate=endDate=yesterday, dimensions=['query'], indexStatusAggregationType='INDEXED_ALL', rowLimit=1. (4) Parse indexedPagesCount. (5) Write to CSV or database. (6) Run daily via cron. Include error handling for 403 and empty rows.

How do I filter out weak pages or low-quality URLs from the indexed count?

The API does not expose page-level quality metrics. You can only get total count. For page-level analysis, use the Search Console API's listCrawlErrors or combine with Google Analytics API to filter pages with 0 sessions. The indexStatusAggregationType parameter only separates indexed vs non-indexed. No quality filter exists in the API.

What are common errors when using the Google Indexed Pages API and how to fix them?

403 rateLimitExceeded: add backoff. 404 siteUrl not found: check format (scDomain: vs scSet:p:). 400 invalid request: ensure startDate and endDate are YYYY-MM-DD format. Empty rows: site may have no indexed data or robots.txt blocks. Slow responses for large sites: increase timeout to 30s. Duplicate queries: use a set to deduplicate site URLs.

How does the Google Indexed Pages API compare to third-party tools like Ahrefs or SEMrush for index count?

The Google API returns official GSC data—accurate within 1-2% of the UI. Third-party tools estimate index count based on their own crawls, which can be 20-50% off for large sites. Advantage of third-party: they show index count for any domain without access. Advantage of Google API: authoritative, free (within quota), and consistent with GSC reports.

Next reads

Related guides

Budget math

Estimate the cost of waiting

Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.