Documentation > Web Crawler > Opensolr Web Crawler - Site Search Solution

🤖 Opensolr Web Crawler

Discover a seamless, AI-powered way to index, enrich, and search your web content—automatically.
Learn even more, here.


What is the Opensolr Web Crawler?

The Opensolr Web Crawler is a robust platform for crawling, indexing, and enriching websites of any size.
It automatically extracts key meta-information, applies Natural Language Processing (NLP) and Named Entity Recognition (NER), and injects all content and structure directly into your Solr index.

  • 🚀 Instantly searchable: All content becomes instantly searchable via a fully responsive, embeddable search UI.
  • 🤖 AI-driven enrichment: Named entities, sentiment, language detection, and more are extracted on the fly.
  • 🕑 Get started in minutes: Launch a powerful, custom search engine on your data without manual setup.

🔎 See It In Action

Or try the Solr API for a live crawl.


⚡ Key Features

  • Full NLP and NER:
    Extract people, locations, organizations, and more using OpenNLP.

  • Comprehensive Metadata Extraction:
    Collects meta tags, page structure, creation dates, and document fields.

  • AI-Hints:
    Opensolr AI-Hints are enabled by default for all crawler indexes, delivering rich context and smart search assistance.

  • Automatic Content Language Detection:
    Indexes and searches in any language, with built-in stopword, synonym, and spellcheck support.

  • Responsive, Embeddable Search UI:
    Integrate Opensolr search into your site, customize top bar, filters, and behavior.

  • Scheduled Recrawling & Live Stats:
    Only new and updated content is fetched, with live stats for crawling and SEO.

  • Secure & Flexible:
    Supports HTTP Auth for protected content, robust backup and replication, and fully managed by API or UI.

  • Rich Content Support:
    Indexes and analyzes HTML, doc, docx, xls, PDF, and most image formats—extracting content, meta, GPS/location data, and sentiment.

  • Crawl Resume:
    Pause and resume crawls anytime; supports cron jobs and incremental indexing.


⚙️ Embedding & Customization

You can embed your Opensolr Web Crawler Search Engine on any website.
Customize your search experience with parameters such as:

  • &topbar=off – Hide the top search tool
  • &q=SEARCH_QUERY – Set the initial search
  • &in=web/media/images – Filter by content type
  • &og=yes/no – Show/hide OG images per result
  • &source=WEBSITE – Restrict to a single domain
  • &fresh=... – Apply result freshness or sentiment bias
  • &lang=en – Filter by language

🚀 What’s New

  • AI-Hints: Enabled by default for every crawler index.
  • Automatic Language Detection and advanced NER via OpenNLP.
  • Customizable for any language and analysis pipeline.
  • Full support for spellcheck, autocomplete, backup, and replication.
  • Live SEO & crawling stats and sentiment analysis.
  • Automated scheduling and easy management via UI or REST API.

📥 Solr Configuration for Crawling

To enable smooth crawling and full feature support, use our ready-made Solr configs:
- Solr 9 Config Zip Archive

Do not manually modify your schema.xml for crawler indexes to ensure all features work as designed.


🎬 Quick Video Demo







Review us on Google Business
ISO-9001 CERTIFIED ISO-27001 CERTIFIED