Discover a seamless, AI-powered way to index, enrich, and search your web content—automatically.
Learn even more, here.
The Opensolr Web Crawler is a robust platform for crawling, indexing, and enriching websites of any size.
It automatically extracts key meta-information, applies Natural Language Processing (NLP) and Named Entity Recognition (NER), and injects all content and structure directly into your Solr index.
Or try the Solr API for a live crawl.
Full NLP and NER:
Extract people, locations, organizations, and more using OpenNLP.
Comprehensive Metadata Extraction:
Collects meta tags, page structure, creation dates, and document fields.
AI-Hints:
Opensolr AI-Hints are enabled by default for all crawler indexes, delivering rich context and smart search assistance.
Automatic Content Language Detection:
Indexes and searches in any language, with built-in stopword, synonym, and spellcheck support.
Responsive, Embeddable Search UI:
Integrate Opensolr search into your site, customize top bar, filters, and behavior.
Scheduled Recrawling & Live Stats:
Only new and updated content is fetched, with live stats for crawling and SEO.
Secure & Flexible:
Supports HTTP Auth for protected content, robust backup and replication, and fully managed by API or UI.
Rich Content Support:
Indexes and analyzes HTML, doc, docx, xls, PDF, and most image formats—extracting content, meta, GPS/location data, and sentiment.
Crawl Resume:
Pause and resume crawls anytime; supports cron jobs and incremental indexing.
You can embed your Opensolr Web Crawler Search Engine on any website.
Customize your search experience with parameters such as:
&topbar=off
– Hide the top search tool&q=SEARCH_QUERY
– Set the initial search&in=web/media/images
– Filter by content type&og=yes/no
– Show/hide OG images per result&source=WEBSITE
– Restrict to a single domain&fresh=...
– Apply result freshness or sentiment bias&lang=en
– Filter by languageTo enable smooth crawling and full feature support, use our ready-made Solr configs:
- Solr 9 Config Zip Archive
Do not manually modify your schema.xml
for crawler indexes to ensure all features work as designed.