Web Crawler API
Start the Web Crawler
Launch the Opensolr web crawler for your index. Configure crawl mode, scope, threads, rendering engine, traffic limits, and more — all via API parameters.
Endpoint
GET https://opensolr.com/solr_manager/api/start_crawl
Parameters
| Parameter | Status | Description |
|---|---|---|
email | Required | Your Opensolr registration email address |
api_key | Required | Your Opensolr API key |
core_name | Required | The name of the index to start crawling for |
follow_docs | Optional | Follow documents and images? yes / no (default: no) |
clean | Optional | Start fresh or resume? yes = start fresh, no = resume (default: no) |
auth_username | Optional | HTTP Basic Auth username if your starting URLs require authentication |
auth_password | Optional | HTTP Basic Auth password |
mode | Optional | Crawl mode 1–6 (default: 1). See mode table below |
max_threads | Optional | Number of concurrent crawler threads (default: 10) |
relax | Optional | Delay between requests in microseconds (default: 100000 = 0.1s) |
max_traffic | Optional | Maximum traffic in GB (default: plan limit) |
max_pages | Optional | Maximum pages to crawl (default: plan limit) |
max_filesize | Optional | Maximum file size per page in KB (default: plan limit) |
renderer | Optional | Rendering engine: curl or chrome (default: curl). See table below |
Crawl Modes
| Mode | Name | Scope | Description |
|---|---|---|---|
1 | Follow Domain Links | Full depth | Follows all links across the entire domain, including subdomains. Starting from www.site.com, it will also crawl shop.site.com, blog.site.com, etc. |
2 | Follow Host Links | Full depth | Stays on the exact hostname only. Starting from www.site.com, links to shop.site.com are ignored. |
3 | Follow Path Links | Full depth | Stays within the URL path prefix on the same host. Starting from site.com/blog/, it will crawl site.com/blog/2024/post but skip site.com/about. |
4 | Shallow Domain Crawl | Depth 1 | Same domain scope as Mode 1, but only discovers links from the start page and its direct children. |
5 | Shallow Host Crawl | Depth 1 | Same host scope as Mode 2, but only discovers links from the start page and its direct children. |
6 | Shallow Path Crawl | Depth 1 | Same path scope as Mode 3, but only discovers links from the start page and its direct children. Useful for quickly indexing a product catalog or documentation section. |
Full depth = the crawler keeps discovering and following new links at every level. Depth 1 = the crawler reads the start page, follows links it finds there, but stops discovering new links after that.
Rendering Engines
| Value | Name | Description |
|---|---|---|
curl | Curl (Fast) | Fast HTTP fetch with no browser overhead. Best for most websites — server-rendered HTML, static sites, WordPress, Drupal, etc. Pages are fetched in ~0.2s each. |
chrome | Chrome (JS Rendering) | Every page is rendered through a headless Chromium browser (Playwright). Use for JavaScript-heavy SPAs built with React, Vue, Angular, Next.js, etc. Adds ~0.5–1s per page. |
All parameters are saved and preserved across scheduled cron restarts. If you pause and resume the crawler, or if the cron schedule restarts it, your original settings (threads, relax delay, traffic limits, etc.) are retained.
Code Examples
cURL — Start a basic crawl
curl -s "https://opensolr.com/solr_manager/api/start_crawl?email=YOUR_EMAIL&api_key=YOUR_API_KEY&core_name=my_solr_core"
cURL — Full crawl with all options
curl -s "https://opensolr.com/solr_manager/api/start_crawl?email=YOUR_EMAIL&api_key=YOUR_API_KEY&core_name=my_solr_core&clean=yes&mode=2&max_threads=5&relax=200000&renderer=chrome&follow_docs=yes&max_pages=10000"
PHP
$params = http_build_query([ 'email' => 'YOUR_EMAIL', 'api_key' => 'YOUR_API_KEY', 'core_name' => 'my_solr_core', 'clean' => 'yes', 'mode' => 2, 'max_threads' => 5, 'renderer' => 'chrome', ]); $response = file_get_contents("https://opensolr.com/solr_manager/api/start_crawl?{$params}"); $result = json_decode($response, true); print_r($result);
Python
import requests response = requests.get("https://opensolr.com/solr_manager/api/start_crawl", params={ "email": "YOUR_EMAIL", "api_key": "YOUR_API_KEY", "core_name": "my_solr_core", "clean": "yes", "mode": 2, "max_threads": 5, "renderer": "chrome", }) print(response.json())
Related Documentation
Start CrawlerLaunch the web crawler with full control over mode, threads, and rendering.
Pause CrawlerTemporarily halt crawling while keeping the cron schedule.
Resume CrawlerContinue crawling immediately from where it left off.
Check StatusCheck whether crawler processes are currently active.
Get Live StatsReal-time crawler statistics: pages crawled, queued, errors.
Flush BufferForce pending documents from the crawler buffer into Solr.
Need help with the Opensolr Web Crawler? We are here to help.
Contact Support