FAQ & API Docs

Select a category on the left, to get your answers quickly

In case the web crawler doesn't work, please try one of the following:

- Try to remove your index and add another one in a different instance / region

- Look at your schema.xml and make the following modifications:

  1. Add these fields to your schema.xml anywhere where other fields are being defined (next to any other <field... definition)
    1. <field name="og_image" type="string" indexed="true" stored="true" />
      <field name="headings1" type="text_general" indexed="true" stored="true" multiValued="true" required="false" default="" />
      <field name="headings2" type="text_general" indexed="true" stored="true" multiValued="true" required="false" default="" />
      <field name="headings3" type="text_general" indexed="true" stored="true" multiValued="true" required="false" default="" />
      <field name="headings4" type="text_general" indexed="true" stored="true" multiValued="true" required="false" default="" />
      <field name="em" type="text_general" indexed="true" stored="true" multiValued="true" required="false" default="" />
      <field name="strong" type="text_general" indexed="true" stored="true" multiValued="true" required="false" default="" />

  2. Find the catchall field definition and change it to this:
    1. <field name="catchall" type="ignored" indexed="false" stored="false" multiValued="true" required="false" default="" />

  3. OPTIONALLY, if you feel that all pages are not indexed, look at your schema.xml and make all date fields ignored (known bug: missmatch date format in Tika, causes entire document to be discarded)

    1. <field name="creation_date" type="ignored" indexed="false" stored="false" />
      <field name="release_date" type="ignored" indexed="false" stored="false" />
      <field name="last_save_date" type="ignored" indexed="false" stored="false" />
      <field name="last_modified" type="ignored" indexed="false" stored="false" />

Email us at support@opensolr.com in case you need further assistance.

Upload a new file for crawling and indexing

  1. ​POST https://opensolr.com/solr_manager/api/index_crawler_file
  2. Parameters:
    1. ​email your opensolr registration email address
    2. api_key - your opensolr api_key
    3. core_name - the name of the core you wish to upload the document for
    4. url - optionally, you can add an URL to crawl and index, alongside the uploaded document. They will both be indexed as separate documents in your opensolr index.
    5. userfile - your local document file to POST and upload to the server (pdf, doc, html, xls, mp3, mp4, etc.)
  3. Example here: https://opensolr.com/solr_manager/index_crawler_file
Yes, our improved web crawler now follows almost any file type found or referenced on your starting ROOT URL, and will index any meta data found in those files.
From HTML to, PDF, DOC, PPT, or even MP3 files, to video files, and any other file type, our web crawler will create your site search engine in just a few minutes.
 
Here are some demo search engines powered by the Opensolr Web Crawler:
- Ziar
 
And here's a nice clip to show you an example:

The opensolr web crawler now indexes and follows any file type in your web root.
To learn more about what fields are indexed, simply create a new opensolr index, go to Config Files Editor, and select schema.xml.

All the fields inside the schema.xml are indexed.

Here are some DEMO search engines powered by the Opensolr Web Crawler:
- Ziar
BBC (Arabic)