API - Start the Web Crawler

Web Crawler
Web Crawler API

Start the Web Crawler

Launch the Opensolr web crawler for your index. Configure crawl mode, scope, threads, rendering engine, traffic limits, and more — all via API parameters.

Endpoint

GET https://opensolr.com/solr_manager/api/start_crawl

Parameters

ParameterStatusDescription
emailRequiredYour Opensolr registration email address
api_keyRequiredYour Opensolr API key
core_nameRequiredThe name of the index to start crawling for
follow_docsOptionalFollow documents and images? yes / no (default: no)
cleanOptionalStart fresh or resume? yes = start fresh, no = resume (default: no)
auth_usernameOptionalHTTP Basic Auth username if your starting URLs require authentication
auth_passwordOptionalHTTP Basic Auth password
modeOptionalCrawl mode 16 (default: 1). See mode table below
max_threadsOptionalNumber of concurrent crawler threads (default: 10)
relaxOptionalDelay between requests in microseconds (default: 100000 = 0.1s)
max_trafficOptionalMaximum traffic in GB (default: plan limit)
max_pagesOptionalMaximum pages to crawl (default: plan limit)
max_filesizeOptionalMaximum file size per page in KB (default: plan limit)
rendererOptionalRendering engine: curl or chrome (default: curl). See table below

Crawl Modes

ModeNameScopeDescription
1Follow Domain LinksFull depthFollows all links across the entire domain, including subdomains. Starting from www.site.com, it will also crawl shop.site.com, blog.site.com, etc.
2Follow Host LinksFull depthStays on the exact hostname only. Starting from www.site.com, links to shop.site.com are ignored.
3Follow Path LinksFull depthStays within the URL path prefix on the same host. Starting from site.com/blog/, it will crawl site.com/blog/2024/post but skip site.com/about.
4Shallow Domain CrawlDepth 1Same domain scope as Mode 1, but only discovers links from the start page and its direct children.
5Shallow Host CrawlDepth 1Same host scope as Mode 2, but only discovers links from the start page and its direct children.
6Shallow Path CrawlDepth 1Same path scope as Mode 3, but only discovers links from the start page and its direct children. Useful for quickly indexing a product catalog or documentation section.

Full depth = the crawler keeps discovering and following new links at every level. Depth 1 = the crawler reads the start page, follows links it finds there, but stops discovering new links after that.

Rendering Engines

ValueNameDescription
curlCurl (Fast)Fast HTTP fetch with no browser overhead. Best for most websites — server-rendered HTML, static sites, WordPress, Drupal, etc. Pages are fetched in ~0.2s each.
chromeChrome (JS Rendering)Every page is rendered through a headless Chromium browser (Playwright). Use for JavaScript-heavy SPAs built with React, Vue, Angular, Next.js, etc. Adds ~0.5–1s per page.
All parameters are saved and preserved across scheduled cron restarts. If you pause and resume the crawler, or if the cron schedule restarts it, your original settings (threads, relax delay, traffic limits, etc.) are retained.

Code Examples

cURL — Start a basic crawl

curl -s "https://opensolr.com/solr_manager/api/start_crawl?email=YOUR_EMAIL&api_key=YOUR_API_KEY&core_name=my_solr_core"

cURL — Full crawl with all options

curl -s "https://opensolr.com/solr_manager/api/start_crawl?email=YOUR_EMAIL&api_key=YOUR_API_KEY&core_name=my_solr_core&clean=yes&mode=2&max_threads=5&relax=200000&renderer=chrome&follow_docs=yes&max_pages=10000"

PHP

$params = http_build_query([
    'email'       => 'YOUR_EMAIL',
    'api_key'     => 'YOUR_API_KEY',
    'core_name'   => 'my_solr_core',
    'clean'       => 'yes',
    'mode'        => 2,
    'max_threads' => 5,
    'renderer'    => 'chrome',
]);
$response = file_get_contents("https://opensolr.com/solr_manager/api/start_crawl?{$params}");
$result = json_decode($response, true);
print_r($result);

Python

import requests

response = requests.get("https://opensolr.com/solr_manager/api/start_crawl", params={
    "email": "YOUR_EMAIL",
    "api_key": "YOUR_API_KEY",
    "core_name": "my_solr_core",
    "clean": "yes",
    "mode": 2,
    "max_threads": 5,
    "renderer": "chrome",
})
print(response.json())

Related Documentation

Need help with the Opensolr Web Crawler? We are here to help.

Contact Support