Stop manually checking Google Search Console for every domain. The Indexing API and Search Analytics API let you pull indexed page counts for hundreds of sites in seconds. This guide covers endpoint usage, authentication quirks, rate limit traps, and a ready-to-run Python script for scheduled reports.
Every SEO agency I know has wasted hours clicking through Google Search Console (GSC) Sitemaps reports for each domain. The Google Indexed Pages API—specifically the Search Analytics API method query with the indexStatusAggregationType filter—returns the exact count of pages Google considers indexed. No HTML parsing, no CAPTCHA risks, no stale data.
The core bottleneck is authentication: you need a service account with GSC owner access, and the siteUrl parameter must match the exact property format (scSet: prefix or scDomain: domain). A common situation we see: a developer copies the URL from the browser bar, includes a trailing slash, and gets a 400 error for 30 minutes until they strip it. The API is strict. Respect it.
Avoid the Indexing API (indexing.googleapis.com) for this use case—that endpoint is for submitting structured data updates, not checking how many pages are indexed. The Search Analytics API is your hammer.
| Feature / Endpoint | Required Setup | Practical Application | Hidden Risk / Failure Mode |
|---|---|---|---|
Search Analytics APIwebmasters.googleapis.com/v3/sites/{siteUrl}/searchAnalytics/query | GSC property verified Service account with owner role OAuth2 scope: https://www.googleapis.com/auth/webmasters.readonly | Post request with startDate, endDate, and dimensions: ['query'] plus indexStatusAggregationType: 'ALL' | Returns 0 rows if property is a domain property but you send a URL-prefix siteUrl format |
Indexing APIindexing.googleapis.com/v3/urlNotifications/metadata | GCP project with Indexing API enabled Service account with Firestore? No—just indexing scope | Check if a single URL is indexed by Google (not GSC count). Returns latest status and timestamp | Rate limit: 200 URLs per day per project. Useless for bulk indexed page counts. Wrong tool for this job. |
| Filter: indexStatusAggregationType | Add to request body: "indexStatusAggregationType": "ALL" | Returns total indexed pages across all statuses (indexed, not indexed, pending). Use "indexStatusAggregationType": "INDEXED_ALL" for only indexed pages | If omitted, API defaults to aggregated metrics without index breakdown. You get clicks and impressions—not page count. |
| Rate Limiting 2000 queries per day per property | Spread requests across properties. 1 query per 100ms recommended | For 50 sites: send 1 request per property, wait 200ms between requests. Total: ~10 seconds | Hitting 429? Google returns 403 with rateLimitExceeded—not 429. Many scripts break because they check for wrong status code. |
| Date Range Max 16 months back | Use startDate: (today - 1 day) and endDate: (today - 1 day) for current count | Single row result with indexed page count for yesterday. Lightweight and fast | Using a 30-day range multiplies rows and slows response. For count only, use a 1-day window. |
| Site URL Format | URL-prefix: scSet:p:https://example.com/Domain: scDomain:example.com | Pass exact property string from GSC settings URL | Leading/trailing slashes cause 400. Domain properties must not include protocol. Wrong format returns 404 site not found. |
| Row Limit Max 5000 rows per response | Set rowLimit: 1 when only counting indexed pages | Response contains one row with keys: [], clicks: 0, impressions: 0, ctr: 0, position: 0, indexedPagesCount: 12345 | Leaving rowLimit high returns all rows for top queries—wastes quota and time. Always set to 1 for count-only checks. |
Create GCP service account, download JSON key. Grant owner access to each GSC property.
Set siteUrl, startDate=endDate=yesterday, dimensions=['query'], indexStatusAggregationType='INDEXED_ALL', rowLimit=1.
POST to searchAnalytics/query. Use exponential backoff on 403/rateLimitExceeded.
Extract indexedPagesCount from rows[0]. Handle empty rows (site has no indexed pages).
Write count to database (BigQuery, SQLite, Google Sheets). Append timestamp for trend tracking.
If count drops >10% from previous 7-day average, send email/Slack alert. Investigate manually.
Service account email added as owner (not user) in GSC property settings. Viewer role returns empty data.
Site URL format matches exactly: scSet:p:http://example.com/ or scDomain:example.com. No typos.
OAuth2 scope set to https://www.googleapis.com/auth/webmasters.readonly. Read-only is sufficient.
Indexing API NOT enabled in GCP project (you do not need it). Only enable Web Search Console API.
Date range is a single day (yesterday). Avoid multi-day ranges unless you truly need trend data.
rowLimit parameter set to 1. Otherwise you burn quota on thousands of query rows.
Error handling catches both 403 (rate limit) and 404 (bad site URL). Retry with backoff for 403.
Test with one known property first. Validate indexedPagesCount matches GSC UI (within 1-2% variance).
Let's run a real check on three client sites: example-consulting.com (domain property), shop.example-store.com (URL-prefix), and blog.example.org (URL-prefix).
Python script (pseudocode):
from google.oauth2 import service_account
from googleapiclient.discovery import build
SCOPES = ['https://www.googleapis.com/auth/webmasters.readonly']
creds = service_account.Credentials.from_service_account_file('key.json', scopes=SCOPES)
service = build('webmasters', 'v3', credentials=creds)
sites = [
'scDomain:example-consulting.com',
'scSet:p:https://shop.example-store.com/',
'scSet:p:https://blog.example.org/'
]
for site_url in sites:
request_body = {
'startDate': '2025-05-27',
'endDate': '2025-05-27',
'dimensions': ['query'],
'indexStatusAggregationType': 'INDEXED_ALL',
'rowLimit': 1
}
response = service.searchanalytics().query(siteUrl=site_url, body=request_body).execute()
rows = response.get('rows', [])
count = rows[0]['indexedPagesCount'] if rows else 0
print(f'{site_url}: {count} indexed pages')
Results: example-consulting.com returned 1,247 indexed pages. shop.example-store.com returned 0—because the store was blocked by robots.txt (edge case: the API counts pages Google attempted to crawl, but if blocked, returns 0). blog.example.org returned 4,521. Total time: 1.2 seconds.
You will encounter these in production. I have seen each one personally.
Blocked URLs. If robots.txt disallows crawling, the API returns indexedPagesCount: 0 even if the property has 10,000 pages. The API only reports pages Google crawled and indexed. Check GSC Crawl Errors report separately.
Wrong property type. A domain property site URL must start with scDomain: and no protocol. A URL-prefix property must start with scSet:p: and include the full URL with trailing slash. Mix them up, get 404.
Empty results. A brand-new site with no data returns rows: []. Your script must handle that gracefully—set count to 0, do not throw KeyError.
Duplicate lists. If you query the same site twice in parallel, you may hit rate limit on the second call (Google counts per property). Use a serial loop with 200ms delay.
Slow vendors. Some API calls take 5+ seconds for large properties (500K+ pages). Set a 30-second timeout. If it fails, retry once. If it fails again, log and skip.
| Option | What happens | Verdict |
|---|---|---|
| Manual GSC UI for 50 sites | Indexed Pages API (Python script) | API wins. 30 seconds vs 2 hours. |
| Data freshness: UI is real-time | API data is 2-3 days behind (Google processing lag) | UI wins for immediate count. API for historical trends. |
| Error handling: UI shows errors inline | API returns 403/404 with no explanation text | UI easier to debug. API requires logging and monitoring. |
| Bulk export: UI requires manual CSV download per property | API returns JSON ready for database insertion | API wins for automation. UI for ad-hoc checks. |
Create a service account and grant owner access to each GSC property (up to 1000 per account). Loop through a list of site URLs with 200ms delays. Store results in a database. Schedule via cron or Cloud Scheduler. Monitor rate limits: 2000 queries per property per day. For 100 sites, one query each = 100 queries total, well under cap.
The Indexing API (indexing.googleapis.com) checks if a specific URL is indexed—returns status like 'URL is in Google'. It is limited to 200 URLs/day per project. The Search Analytics API (webmasters.googleapis.com) returns the total count of indexed pages for a property. Use Search Analytics for bulk counts. Use Indexing API for single URL diagnostics.
Go to GCP Console > IAM & Admin > Service Accounts. Create a service account, generate a JSON key. Then add that service account email as an owner (not user) in GSC property settings > Settings > Users & Permissions. Use the key file in your script with google.oauth2.service_account.Credentials. Scope: https://www.googleapis.com/auth/webmasters.readonly.
Most common causes: (1) robots.txt blocks crawling—API only counts crawled pages. (2) Wrong site URL format—use scDomain:domain.com for domain properties. (3) Date range too wide—use a single day. (4) indexStatusAggregationType set to 'ALL' instead of 'INDEXED_ALL'—ALL includes non-indexed pages but the count field may still be zero. (5) Property is brand new with no data yet.
2000 queries per day per GSC property. 1 query per 100ms per property (roughly 10 queries per second). Exceeding returns 403 with body {error: {code: 403, message: 'Rate limit exceeded'}}. Use exponential backoff: wait 1s, 2s, 4s, 8s on 403. Spread queries across properties to avoid hitting the per-property limit.
Yes, if you own or manage those GSC properties. You cannot check third-party domains unless they grant you owner access to their GSC property. For guest posts, ask the site owner to add your service account as a user. Alternatively, use the Site Audit API (from third-party tools like Ahrefs) to estimate indexed pages, but that is not official Google data.
Use google-api-python-client. Steps: (1) Authenticate with service account. (2) Build searchanalytics service. (3) For each site, call query with startDate=endDate=yesterday, dimensions=['query'], indexStatusAggregationType='INDEXED_ALL', rowLimit=1. (4) Parse indexedPagesCount. (5) Write to CSV or database. (6) Run daily via cron. Include error handling for 403 and empty rows.
The API does not expose page-level quality metrics. You can only get total count. For page-level analysis, use the Search Console API's listCrawlErrors or combine with Google Analytics API to filter pages with 0 sessions. The indexStatusAggregationType parameter only separates indexed vs non-indexed. No quality filter exists in the API.
403 rateLimitExceeded: add backoff. 404 siteUrl not found: check format (scDomain: vs scSet:p:). 400 invalid request: ensure startDate and endDate are YYYY-MM-DD format. Empty rows: site may have no indexed data or robots.txt blocks. Slow responses for large sites: increase timeout to 30s. Duplicate queries: use a set to deduplicate site URLs.
The Google API returns official GSC data—accurate within 1-2% of the UI. Third-party tools estimate index count based on their own crawls, which can be 20-50% off for large sites. Advantage of third-party: they show index count for any domain without access. Advantage of Google API: authoritative, free (within quota), and consistent with GSC reports.
Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.