Opensolr Web Crawler — Site Search Solution
The Opensolr Web Crawler crawls your entire site, extracts structured data, applies NLP + NER, generates vector embeddings, and feeds everything into Solr — fully indexed and ready to search. No manual config. No fiddling with schemas. Just point it at your site and go.
For setup details, assistance, or pricing: support@opensolr.com
Live Demos
Vector Search (AI-powered)
- Home Improvement (RO, 500K+ products)
- Office & Industrial Supplies (DE, 85K+ products)
- News (EN)
- Opensolr (EN)
Keyword Search Demos
-
Stiri (RO) | Nyheter (SV) | Fresh News (EN) | Tech News (EN)
-
Full Documentation & Testing Guide | Hybrid Search Deep-Dive
Crawl Modes
The crawl mode controls how far the crawler follows links from your starting URL. There are three scope types — each available in full depth or shallow (depth 1) variants.
Mode 1 — Follow Domain Links (full depth)
Crawls all pages across the entire domain, including all subdomains.
Example: Start URL is
https://www.example.com/blogThe crawler will follow links towww.example.com,shop.example.com,help.example.com— anything onexample.com.
Best for: Indexing an entire website including all its subdomains.
Mode 2 — Follow Host Links (full depth)
Crawls only pages on the exact same hostname. Subdomains are treated as separate sites.
Example: Start URL is
https://www.example.com/blogThe crawler will follow links onwww.example.comonly. Links toshop.example.comorhelp.example.comare ignored.
Best for: Indexing one specific subdomain without pulling in content from other parts of the site.
Mode 3 — Follow Path Links (full depth)
Crawls only pages that start with the same URL path on the same host.
Example: Start URL is
https://www.example.com/blog/The crawler will followwww.example.com/blog/2024/my-postandwww.example.com/blog/categories, but will skipwww.example.com/aboutorwww.example.com/shop/.
Best for: Indexing a specific section of a website, like a blog, documentation area, or product category.
Mode 4 — Shallow Domain Crawl (depth 1)
Same domain-level scope as Mode 1, but only discovers links from the start page and its direct children. Pages found deeper are crawled but don't contribute new links.
Example: Start URL is
https://www.example.comThe crawler reads the homepage, finds 50 links, crawls those 50 pages — but does not follow any links found on those 50 pages.
Best for: A shallow crawl of top-level content — landing pages, product listings, or news homepages where you only want the first layer.
Mode 5 — Shallow Host Crawl (depth 1)
Same host-level scope as Mode 2, combined with depth-1 link discovery. Stays on the exact hostname and only follows links from the start page and its direct children.
Best for: A quick, shallow index of a single subdomain.
Mode 6 — Shallow Path Crawl (depth 1)
Same path-level scope as Mode 3, combined with depth-1 link discovery. Stays within the URL path and only follows links from the start page and its direct children.
Best for: A focused, shallow crawl of a specific section — useful for quickly indexing a product catalog or documentation area without going deep.
Embedding & Customization
Embed your Opensolr Web Crawler search on any website. Customize behavior with URL parameters.
Important: To embed the search UI on your website, contact us to have your domain whitelisted and approved for iframe embedding.
| Parameter | Description |
|---|---|
&topbar=off |
Hide the top search bar |
&q=SEARCH_QUERY |
Set the initial search query |
&in=web/media/images |
Filter by content type |
&og=yes/no |
Show or hide OG images per result |
&source=WEBSITE |
Restrict results to a single domain |
&fresh=... |
Apply result freshness or sentiment bias |
&lang=en |
Filter by language |
&pagination_style=scroll/pages |
Infinite scroll (default) or numbered pages |
&ui_theme=light/dark |
Color theme |
&layout=default/fullwidth |
Container width |
&locale=en_us/de_de/ro_ro |
Filter by OG locale metadata |
What's New
- AI-Hints enabled by default for every crawler index.
- Automatic Language Detection and advanced NER via OpenNLP.
- Customizable for any language and analysis pipeline.
- Full support for spellcheck, autocomplete, backup, and replication.
- Live SEO & crawling stats and sentiment analysis.
- Pause & Resume with schedule management via UI or REST API.
- Schedule Optimize — set your index to auto-optimize on a recurring schedule.
Solr Configuration
To enable smooth crawling and full feature support, use the ready-made Solr configs:
Do not manually modify your schema.xml for crawler indexes to ensure all features work as designed.
Quick Video Demo
Query Elevation — Pin & Exclude Search Results
Take full control of what your users see. Query Elevation lets you pin important results to the top or exclude irrelevant ones — directly from the Search UI, with zero code and no reindexing required.
- Pin — Force a specific result to the top of the list for a given search query
- Exclude — Hide a result completely so it never appears for that query
- Pin All / Exclude All — Apply the rule globally, across every search query
- Drag & drop — Reorder pinned results to control exactly which one shows first
Enable it from your index settings panel, then open the Search UI — every result gets an elevation toolbar. Perfect for promoting landing pages, burying outdated content, or curating high-value queries.