Opensolr Web Crawler - Site Search Solution

The Opensolr Web Crawler follows and indexes your entire website's HTML and plain text information, and also creates a sample, embedded search engine, ready for plug-and-play, into your website or application.

Update - Nov 16 2021
What's new:

  • HTTP Auth, so that you can follow your protected documents/pages
  • Follows and indexes full content and meta data of the following rich text formats: doc, docx, xls, pdf, and most image files formats.
  • Adds content sentiment to each page/document indexed, in order to identify potential hateful content for each of the indexed documents (web pages).
  • Adds GPS position for image files meta data, that can be used as location fields in Solr, to perform geo-location radius search requests.
  • Fulll live crawling stats that also serves as an SEO tool, while crawling.
  • Smartly collects page/document creation date and includes the date in the search scoring function for fresh results elevation.
  • Automate crawling and get LIVE stats via the Opensolr Web Crawler UI, or via the Automation REST APIs
  • Start your crawl with clean=no in order to ONLY grab fresh content on your starting URL (homepage).
  • Take advantage of multiple CPU cores, with our multi-thread implementation, for a dramatically faster crawling speed.
  • Supports resume, without losing any data. Crawl parts of your website, every day, or based on your own cron jobs, by taking advantage of the Automation REST API.

Quick Video Tour

 

Important: Only works with Solr version 7+

To learn more about what fields are indexed, simply create a new opensolr index, go to Config Files Editor, and select schema.xml.
In order to preserve your Web Crawler's functionality, please do not edit your schema.xml fields, or any other configuration files.

To make sure crawling is done correctly, only use our Solr 7+ servers, and make sure to apply the Solr configuration archive, corresponding to your Solr version, below:

Solr 7 Config Zip Archive
Solr 8 Config Zip Archive

Important: The Out-of-the-box version of the Web Crawler, is only available for small-mid websites, under 200 pages.
For larger websites, please Contact Us, to get a quote for an Opensolr Indexing and Search Resilient Cluster

Screenshots