Web Crawler

API - Get LIVE Web Crawler Stats

Web Crawler API

Get LIVE Web Crawler Stats

Get real-time statistics for your web crawler: pages crawled, pages queued, errors, traffic used, and more.

Endpoint

GET https://opensolr.com/solr_manager/api/get_crawl_stats

Parameters

Parameter	Status	Description
`email`	Required	Your Opensolr registration email address
`api_key`	Required	Your Opensolr API key
`core_name`	Required	The name of the index you wish to get live crawler stats for

Code Examples

cURL

curl -s "https://opensolr.com/solr_manager/api/get_crawl_stats?email=YOUR_EMAIL&api_key=YOUR_API_KEY&core_name=my_solr_core"

PHP

$params = http_build_query([
    'email'     => 'YOUR_EMAIL',
    'api_key'   => 'YOUR_API_KEY',
    'core_name' => 'my_solr_core',
]);
$response = file_get_contents("https://opensolr.com/solr_manager/api/get_crawl_stats?{$params}");
$stats = json_decode($response, true);
print_r($stats);

Python

import requests

response = requests.get("https://opensolr.com/solr_manager/api/get_crawl_stats", params={
    "email": "YOUR_EMAIL",
    "api_key": "YOUR_API_KEY",
    "core_name": "my_solr_core",
})
stats = response.json()
print(stats)

Use this endpoint to monitor crawler progress in real time. It's useful for building dashboards or deciding when to pause/resume the crawler.

API - Start the Web Crawler

Web Crawler API

Start the Web Crawler

Launch the Opensolr web crawler for your index. Configure crawl mode, scope, threads, rendering engine, traffic limits, and more — all via API parameters.

Endpoint

GET https://opensolr.com/solr_manager/api/start_crawl

Parameters

Parameter	Status	Description
`email`	Required	Your Opensolr registration email address
`api_key`	Required	Your Opensolr API key
`core_name`	Required	The name of the index to start crawling for
`follow_docs`	Optional	Follow documents and images? `yes` / `no` (default: `no`)
`clean`	Optional	Start fresh or resume? `yes` = start fresh, `no` = resume (default: `no`)
`auth_username`	Optional	HTTP Basic Auth username if your starting URLs require authentication
`auth_password`	Optional	HTTP Basic Auth password
`mode`	Optional	Crawl mode `1`–`6` (default: `1`). See mode table below
`max_threads`	Optional	Number of concurrent crawler threads (default: `10`)
`relax`	Optional	Delay between requests in microseconds (default: `100000` = 0.1s)
`max_traffic`	Optional	Maximum traffic in GB (default: plan limit)
`max_pages`	Optional	Maximum pages to crawl (default: plan limit)
`max_filesize`	Optional	Maximum file size per page in KB (default: plan limit)
`renderer`	Optional	Rendering engine: `curl` or `chrome` (default: `curl`). See table below
`schedule`	Optional	Create a recurring cron schedule? `yes` = crawler re-runs automatically to pick up new content, `no` = one-time crawl only (default: `yes`). Use `no` for on-demand reindex operations.

Crawl Modes

Mode	Name	Scope	Description
`1`	Follow Domain Links	Full depth	Follows all links across the entire domain, including subdomains. Starting from `www.site.com`, it will also crawl `shop.site.com`, `blog.site.com`, etc.
`2`	Follow Host Links	Full depth	Stays on the exact hostname only. Starting from `www.site.com`, links to `shop.site.com` are ignored.
`3`	Follow Path Links	Full depth	Stays within the URL path prefix on the same host. Starting from `site.com/blog/`, it will crawl `site.com/blog/2024/post` but skip `site.com/about`.
`4`	Shallow Domain Crawl	Depth 1	Same domain scope as Mode 1, but only discovers links from the start page and its direct children.
`5`	Shallow Host Crawl	Depth 1	Same host scope as Mode 2, but only discovers links from the start page and its direct children.
`6`	Shallow Path Crawl	Depth 1	Same path scope as Mode 3, but only discovers links from the start page and its direct children. Useful for quickly indexing a product catalog or documentation section.

Full depth = the crawler keeps discovering and following new links at every level. Depth 1 = the crawler reads the start page, follows links it finds there, but stops discovering new links after that.

Rendering Engines

Value	Name	Description
`curl`	Curl (Fast)	Fast HTTP fetch with no browser overhead. Best for most websites — server-rendered HTML, static sites, WordPress, Drupal, etc. Pages are fetched in ~0.2s each.
`chrome`	Chrome (JS Rendering)	Every page is rendered through a headless Chromium browser (Playwright). Use for JavaScript-heavy SPAs built with React, Vue, Angular, Next.js, etc. Adds ~0.5–1s per page.

All parameters are saved and preserved across scheduled cron restarts. Use schedule=no for one-time crawl runs (e.g. Reindex From Scratch) that should not create or modify any cron schedule. If you pause and resume the crawler, or if the cron schedule restarts it, your original settings (threads, relax delay, traffic limits, etc.) are retained.

Code Examples

cURL — Start a basic crawl

curl -s "https://opensolr.com/solr_manager/api/start_crawl?email=YOUR_EMAIL&api_key=YOUR_API_KEY&core_name=my_solr_core"

cURL — Full crawl with all options

curl -s "https://opensolr.com/solr_manager/api/start_crawl?email=YOUR_EMAIL&api_key=YOUR_API_KEY&core_name=my_solr_core&clean=yes&mode=2&max_threads=5&relax=200000&renderer=chrome&follow_docs=yes&max_pages=10000&schedule=no"

PHP

$params = http_build_query([
    'email'       => 'YOUR_EMAIL',
    'api_key'     => 'YOUR_API_KEY',
    'core_name'   => 'my_solr_core',
    'clean'       => 'yes',
    'mode'        => 2,
    'max_threads' => 5,
    'renderer'    => 'chrome',
]);
$response = file_get_contents("https://opensolr.com/solr_manager/api/start_crawl?{$params}");
$result = json_decode($response, true);
print_r($result);

Python

import requests

response = requests.get("https://opensolr.com/solr_manager/api/start_crawl", params={
    "email": "YOUR_EMAIL",
    "api_key": "YOUR_API_KEY",
    "core_name": "my_solr_core",
    "clean": "yes",
    "mode": 2,
    "max_threads": 5,
    "renderer": "chrome",
})
print(response.json())

API - Stop the Web Crawler

Stop the Web Crawler

Permanently stops the web crawler by killing all running processes and removing the cron schedule. The crawler will not run again until manually restarted. If you only want to temporarily pause crawling, use the Pause API instead.

GET https://opensolr.com/solr_manager/api/stop_crawl
Parameters:
1. email - (required) your opensolr registration email address
2. api_key - (required) your opensolr api_key
3. core_name - (required) the name of the index you wish to stop the web crawler for

Example: https://opensolr.com/solr_manager/api/stop_crawl?email=PLEASE_LOG_IN&
api_key=PLEASE_LOG_IN&core_name=my_solr_core

Read Full Answer

API - Get LIVE Web Crawler Stats

Get LIVE Web Crawler Stats

Endpoint

Parameters

Code Examples

cURL

PHP

Python

Related Documentation

API - Start the Web Crawler

Start the Web Crawler

Endpoint

Parameters

Crawl Modes

Rendering Engines

Code Examples

cURL — Start a basic crawl

cURL — Full crawl with all options

PHP

Python

Related Documentation

API - Stop the Web Crawler

Stop the Web Crawler

Web Crawler

Get LIVE Web Crawler Stats

Endpoint

Parameters

Code Examples

$_ cURL

PHP PHP

Py Python

Related Documentation

Start the Web Crawler

Endpoint

Parameters

Crawl Modes

Rendering Engines

Code Examples

$_ cURL — Start a basic crawl

$_ cURL — Full crawl with all options

PHP PHP

Py Python

Related Documentation

Stop the Web Crawler

cURL

PHP

Python

cURL — Start a basic crawl

cURL — Full crawl with all options

PHP

Python