What is the Opensolr Web Crawler?
The Opensolr Web Crawler offers a seamless solution, effortlessly indexing websites while leveraging robust Natural Language Processing (NLP) and Named Entity Recognition (NER) capabilities. By crawling every page, it automatically extracts and inserts comprehensive meta-information directly into the Solr index. This process ensures that the content is instantly searchable through a fully responsive, embeddable search engine UI, enabling users to create a powerful and tailored search experience within minutes.
Click here for an example Solr API for one of our Demo Web Crawl projects.
Search Engine Demos:
All the search engines below, can be embedded intro your wbesite, or used as they are. If you embed, you also get the option to hide the top search bar, and customize the search experience, by addingthe following parameters:
&topbar=off / block - either to show the top search tool or not &q=SEARCH_QUERY - enter a starting search query, or leave empty to get all results &in=web / media / images - either to search only in web pages, documents or images &og=yes / no - either to display the og image for each result or not &source=WEBSITE - enter the domain, to restrict the search to. If you have multiple websites you crawled and indexed, use this parameter to restrict to only one domain &fresh=yes / no / hour / today / previous_week / previous_month / previous_3month / positive / negative - the bias applied to the search results &lang=en - ISO code of the results language
To make sure crawling works correctly, only use our Web Crawler Enabled environments, and make sure to apply the below Solr configuration archive, corresponding to the Solr version you are using: Solr 9 Config Zip Archive
To learn more about what fields are indexed, simply create a new opensolr index, go to Config Files Editor, and select schema.xml. In order to preserve your Web Crawler's functionality, please do not edit your schema.xml fields, or any other configuration files.
Quick Video Demo