What is the Opensolr Web Crawler?
It is a bespoke, plug-n-play solution, that will crawl your entire website, and insert the full website meta information, along with each the plain text version of your html pages, and rich text format documents, into your Opensolr Index, and also creates a fuly responsive search engine, that you can either use in your website, or, you can use the Solr API directly, in order to create your own search engine UI.
Click here for an example Solr API for one of our Demo Web Crawl projects
Search Engine Demo:
Fresh News Aggregator Romanian Tech News English Swedish Italian
- Automatic Content Language Detection (beta)
- Can be customised for any languange analysis, with stopwords, synonyms, spellcheck, etc.
- Fully responsive, embeddable Search Engine UI
- Automatic scheduled, re-crawling of fresh content only.
- HTTP Auth, so that you can follow your protected documents/pages.
- Full support for spellcheck and autocomplete.
- Follows and indexes full content and meta data of the following rich text formats: doc, docx, xls, pdf, and most image files formats.
- Adds content sentiment to each page/document indexed, in order to identify potential hateful content for each of the indexed documents (web pages).
- Adds GPS position for image files meta data, that can be used as location fields in Solr, to perform geo-location radius search requests.
- Fulll live crawling stats that also serves as an SEO tool, while crawling.
- Smartly collects page/document creation date and includes the date in the search scoring function for fresh results elevation.
- Automate crawling and get LIVE stats via the Opensolr Web Crawler UI, or via the Automation REST APIs
- Supports resume, without losing any data. Crawl parts of your website, every day, or based on your own cron jobs, by taking advantage of the Automation REST API.
To make sure crawling works correctly, only use our Web Crawler Enabled environments, and make sure to apply the below Solr configuration files archive, corresponding to the Solr version you are using:
Solr 7 Config Zip Archive
Solr 8 Config Zip Archive
Solr 9 Config Zip Archive
To learn more about what fields are indexed, simply create a new opensolr index, go to Config Files Editor, and select schema.xml.
In order to preserve your Web Crawler's functionality, please do not edit your schema.xml fields, or any other configuration files.
Quick Video Tours