API - Start the Web Crawler

Web Crawler API

Start the Web Crawler

Launch the Opensolr web crawler for your index. Configure crawl mode, scope, threads, rendering engine, traffic limits, and more — all via API parameters.

Endpoint

GET https://opensolr.com/solr_manager/api/start_crawl

Parameters

Parameter	Status	Description
`email`	Required	Your Opensolr registration email address
`api_key`	Required	Your Opensolr API key
`core_name`	Required	The name of the index to start crawling for
`follow_docs`	Optional	Follow documents and images? `yes` / `no` (default: `no`)
`clean`	Optional	Start fresh or resume? `yes` = start fresh, `no` = resume (default: `no`)
`auth_username`	Optional	HTTP Basic Auth username if your starting URLs require authentication
`auth_password`	Optional	HTTP Basic Auth password
`mode`	Optional	Crawl mode `1`–`6` (default: `1`). See mode table below
`max_threads`	Optional	Number of concurrent crawler threads (default: `10`)
`relax`	Optional	Delay between requests in microseconds (default: `100000` = 0.1s)
`max_traffic`	Optional	Maximum traffic in GB (default: plan limit)
`max_pages`	Optional	Maximum pages to crawl (default: plan limit)
`max_filesize`	Optional	Maximum file size per page in KB (default: plan limit)
`renderer`	Optional	Rendering engine: `curl` or `chrome` (default: `curl`). See table below
`schedule`	Optional	Create a recurring cron schedule? `yes` = crawler re-runs automatically to pick up new content, `no` = one-time crawl only (default: `yes`). Use `no` for on-demand reindex operations.

Crawl Modes

Mode	Name	Scope	Description
`1`	Follow Domain Links	Full depth	Follows all links across the entire domain, including subdomains. Starting from `www.site.com`, it will also crawl `shop.site.com`, `blog.site.com`, etc.
`2`	Follow Host Links	Full depth	Stays on the exact hostname only. Starting from `www.site.com`, links to `shop.site.com` are ignored.
`3`	Follow Path Links	Full depth	Stays within the URL path prefix on the same host. Starting from `site.com/blog/`, it will crawl `site.com/blog/2024/post` but skip `site.com/about`.
`4`	Shallow Domain Crawl	Depth 1	Same domain scope as Mode 1, but only discovers links from the start page and its direct children.
`5`	Shallow Host Crawl	Depth 1	Same host scope as Mode 2, but only discovers links from the start page and its direct children.
`6`	Shallow Path Crawl	Depth 1	Same path scope as Mode 3, but only discovers links from the start page and its direct children. Useful for quickly indexing a product catalog or documentation section.

Full depth = the crawler keeps discovering and following new links at every level. Depth 1 = the crawler reads the start page, follows links it finds there, but stops discovering new links after that.

Rendering Engines

Value	Name	Description
`curl`	Curl (Fast)	Fast HTTP fetch with no browser overhead. Best for most websites — server-rendered HTML, static sites, WordPress, Drupal, etc. Pages are fetched in ~0.2s each.
`chrome`	Chrome (JS Rendering)	Every page is rendered through a headless Chromium browser (Playwright). Use for JavaScript-heavy SPAs built with React, Vue, Angular, Next.js, etc. Adds ~0.5–1s per page.

All parameters are saved and preserved across scheduled cron restarts. Use schedule=no for one-time crawl runs (e.g. Reindex From Scratch) that should not create or modify any cron schedule. If you pause and resume the crawler, or if the cron schedule restarts it, your original settings (threads, relax delay, traffic limits, etc.) are retained.

Code Examples

cURL — Start a basic crawl

curl -s "https://opensolr.com/solr_manager/api/start_crawl?email=YOUR_EMAIL&api_key=YOUR_API_KEY&core_name=my_solr_core"

cURL — Full crawl with all options

curl -s "https://opensolr.com/solr_manager/api/start_crawl?email=YOUR_EMAIL&api_key=YOUR_API_KEY&core_name=my_solr_core&clean=yes&mode=2&max_threads=5&relax=200000&renderer=chrome&follow_docs=yes&max_pages=10000&schedule=no"

PHP

$params = http_build_query([
    'email'       => 'YOUR_EMAIL',
    'api_key'     => 'YOUR_API_KEY',
    'core_name'   => 'my_solr_core',
    'clean'       => 'yes',
    'mode'        => 2,
    'max_threads' => 5,
    'renderer'    => 'chrome',
]);
$response = file_get_contents("https://opensolr.com/solr_manager/api/start_crawl?{$params}");
$result = json_decode($response, true);
print_r($result);

Python

import requests

response = requests.get("https://opensolr.com/solr_manager/api/start_crawl", params={
    "email": "YOUR_EMAIL",
    "api_key": "YOUR_API_KEY",
    "core_name": "my_solr_core",
    "clean": "yes",
    "mode": 2,
    "max_threads": 5,
    "renderer": "chrome",
})
print(response.json())

API - Start the Web Crawler

Start the Web Crawler

Endpoint

Parameters

Crawl Modes

Rendering Engines

Code Examples

$_ cURL — Start a basic crawl

$_ cURL — Full crawl with all options

PHP PHP

Py Python

Related Documentation

cURL — Start a basic crawl

cURL — Full crawl with all options

PHP

Python