Documentation

Select a category on the left, to get your answers quickly

Solr will often use quite a lot of RAM Memory, in order to build the search results response.

Therefore it is important that we follow a few Best Practices, in order to ensure that we do not overuse any resources that would not otherwise be needed.

It often happens that a dedicated Opensolr Environment, with quite an extensive amount of RAM Memory, won't be able to handle rather small Solr Indexes, because of the wrong implementation of certain Solr schema.xml configuration parameters, and other requests that will cause Solr to be killed by the Solr OOM script, when the Environment runs out of memory.

Also, Opensolr has a self-healing processes that will kick in for any crashed Solr process, recovering the Solr service in just under 1 minute.

Here are some Best Practices that you can use, to mitigate these effects:

  1. We have defined here, a few methods in which you can Save Transfer Bandwidth, which would in turn, also help with the Memory Management as well.
  2. Are you using Solr to return a very large number of documents in one request?
    1. That will cause Solr to allocate memory for all the data in your index, and then keep that cached, and reallocate new memory for each new query. 
    2. The solution is to keep the &rows parameter to a value under 100, (&rows=100) as much as possible, and not tell Solr to return more data than it is necessary in each request.
  3. Are you paginating over a very large number of pges?
    1. Requesting Solr to return documents starting from a very high offset, will again cause it to allocate memory for all the data in the index, and have that cached in, which will, again, quickly exhaust the Envionment's RAM Memory.
    2. The solution is to try to only paginate to a reasonable amount of pages, as much as possible. In other words, keep the start parameter below 50,000 (&start=50000&rows=100)
    3. This also depends on the number of fields you have stored vs indexed fields, as, the more stored fields you have defined in your schema.xml the more RAM will be used for such high &start parameter, since Solr will have to allocate more memory for each field data.
  4. Are you using heavy FacettingSortingHighlighting, or Group Queries?
    1. The solution is that all these, as a best practice, should be done on docValues=true fields.
    2. That is to say, that, in your schema.xml you should define the fields used for Facetting Sorting and Grouping, as docValues=true
      1. Example: 
        <field name="name" docValues="true" type="text_general" indexed="true" stored="true" />
    3. In some cases, defining more parameteres will be useful, especially when using highlighting on certain fields.
      1. Example: 
        <field name="description" type="text_general" indexed="true" stored="true" docValues="true" termVectors="true" termPositions="true" termOffsets="true" storeOffsetsWithPositions="true" />
  5. Field, Filter, Query and Document Caching
    1. In many cases, using such caching on your Solr index, will do more harm than good since those caches won't always be hit or the hit ratio will be very low, in which case memory will be filled with useless caches.
    2. The solrconfig.xml file, has multiple caching configurations that can be tweaked in order to ensure that caching is not overused:
      1. filterCache - storing unordered lists of document ids that have been returned by the “fq” (filterQuery) parameter of your queries.
      2. queryResultCache - stores document ids returned by searches
      3. documentCache - caches fieldValues that have been defined as “stored” in the schema, so that Solr does not have to go back to the index to fetch and return them for display.
      4. fieldCache - used to store all of the values for a field in memory rather than on to disk. For a large index, the fieldCache can occupy a lot of memory, especially if caching many fields.
    3. The solution is finding the definitions of these caches in your solrconfig.xml and setting them to a value as low as possible. 
      1. Example:
        <filterCache size="1" initialSize="1" autowarmCount="0"/>
  6. If using Drupal, make sure you update your Search API Solr module to the latest version, in order to fix this known bug.

 

If you get the Solr error: "too many boolean clauses", please try to check your synonyms.txt, stopwords.txt or protwords.txt, and try to make those files smaller.

In other words, Solr is trying to apply boolean clauses for each one of those words that are found in any of those files, depending on your schema.xml configuration.

A quick fix, is to remove some of the synonyms from synonyms.txt or other words from the other txt files shown here, and/or you can also take a look at your schema.xml and make sure that your synonyms, stopwords and protwords are configured properly in the chain of tokenizers and filters for your fieldType definitions.

Also, try not to apply synonyms.txt at indexing time, as that replaces many of the original words with their synonyms, considerably increasing the size of your index, and also make search far less relevant in some cases.

Here's an example setup for the synonyms.txt usage, in a text_general field, that we use for our Web Crawler Site Search solution:

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" protected="protwords.txt" splitOnCaseChange="0" generateWordParts="1" preserveOriginal="1" catenateAll="0" catenateWords="1"/>
<filter class="solr.LengthFilterFactory" min="2" max="100"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" protected="protwords.txt" language="English"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymGraphFilterFactory" ignoreCase="true" synonyms="synonyms.txt" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="0" generateNumberParts="1" protected="protwords.txt" splitOnCaseChange="0" generateWordParts="1" preserveOriginal="1" catenateAll="0" catenateWords="0"/>
<filter class="solr.LengthFilterFactory" min="2" max="100"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" protected="protwords.txt" language="English"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>

You can learn more here.

 

Enable Spellcheck In Solr

Enabling spellcheck in Apache Solr is a useful feature that allows you to provide suggestions for misspelled or incorrect search queries. To enable spellcheck in Solr, you need to configure your Solr schema, Solr configuration files, and query parameters. Here's a step-by-step guide on how to do it:

  1. Schema Configuration:
    1. Open your Solr schema configuration file (usually named schema.xml) located in your Solr core's conf directory.
    2. Add a field type that specifies how you want to handle text for spellchecking. You can use the TextField type, for example:
      1. <fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
        <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
        </fieldType>
    3. Define a new field that uses this field type for your spellcheck suggestions. This field should be used for indexing your content.
      1. <field name="content" type="textSpell" indexed="true" stored="true"/>
    4. Add a new field for the spellcheck dictionary, where Solr will store its spellcheck suggestions.
      1. <field name="spell" type="textSpell" indexed="true" stored="false" multiValued="true"/>
  2. Solr Configuration:
    1. Open your Solr configuration file (usually named solrconfig.xml) located in your Solr core's conf directory.
    2. Find the <requestHandler> configuration section for your search endpoint (e.g., /select) and add the spellcheck component to it. You should also configure other parameters as needed.
      1. <requestHandler name="/select" class="solr.SearchHandler"> 
        <!-- ... -->
        <arr name="last-components">
        <str>spellcheck</str>
        </arr>
        </requestHandler>
  3. Spellcheck Component Configuration:
    1. Still in the solrconfig.xml file, configure the spellcheck component. You can define its settings under the <searchComponent> section.
      1. <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
        <lst name="spellchecker">
        <str name="name">default</str>
        <str name="field">spell</str>
        <str name="classname">solr.DirectSolrSpellChecker</str>
        <str name="distanceMeasure">internal</str>
        <float name="accuracy">0.5</float>
        <int name="maxEdits">2</int>
        <int name="minPrefix">1</int>
        <int name="maxInspections">5</int>
        <int name="minQueryLength">3</int>
        <float name="maxQueryFrequency">0.5</float>
        </lst>
        </searchComponent>
  4. Reindex Data:
    1. After making these schema and configuration changes, you need to reindex your data.
  5. Querying with Spellcheck:
    1. When making a search query to Solr, you can enable spellcheck suggestions by adding the spellcheck parameter to your query:
      1. /select?q=your_query&spellcheck=true
    2. Solr will return spellcheck suggestions in the response, typically under the spellcheck section.

By following these steps, you should be able to enable spellcheck in Apache Solr and provide search query suggestions for misspelled terms. Make sure to adjust the configuration parameters according to your specific use case and requirements.

 

Please be advised that, your Opensolr Index may fail to reload, when using AnalyzingInfixSuggester

It turns out, that Drupal, is exporting the Solr Configuration zip archive erroneously.

Basically, you will need to manually edit solrconfig_extra.xml, in order to explicitly specify a separate folder for each suggester dictionary.

You can click here to learn more, from the Bug reported to the Drupal Community.

 

There is often the case (as it is with drupal), that your config files will contain files like schema_extra.xml, or solrconfig_extra.xml

In this case, your main schema.xml and solrconfig.xml will contain references to various fields and types that are defined in those extra files.

Therefore, you need to split your config files archive into multiple archives, and upload them as follows:

- First upload the extra files (zip up the schema_extra.xml and other *_extra.xml files and upload that zip first)

- Second upload the main schema.xml file, along with all other resource files, such as stopwords.txt, synonyms.txt, etc.

- Third, upload a zip archive that contains solrconfig.xml alone.

Solr works with a set of multiple configuration files.
Each Solr configuration file, has it's own purpose.

Therefore, in some cases, some publishers (CMS systems, etc), will chose to create their own structure for such Solr configuration files, such as, it is the case with Drupal, and maybe WordPress (WPSOLR), and others.

When uploading your solr configuration files, using your Opensolr Index Control Panel, it is, therefore, important to upload your files in a specific order:

  1. Create and upload a .zip archive containing all your dependency config files such as .txt files, schema-extra.xml solrconfig-extra.xml, pretty much, everything except schema.xml and solrconfig.xml
  2. Create and upload a .zip archive containing your schema.xml file since that defines all fields and uses references to the archive you uploaded before (that contains schema-extra.xml and others like that)
  3. Create and upload a .zip archive containing your solrconfig.xml file, since this one will have references to field definitions inside your schema.xml and other dependency files.

So, basically, you should simply create those 3 archives and upload them separately, in this exact order, and then everything should work.
You can, of course automate this, by using the Automation REST API to upload your config files.

If you get the error:

Undefined field _text_

Please make sure to open up solrconfig.xml in your Opensolr Control Panel Admin UI and remove the reference to the _text_ field under the /update initParams:

<initparams path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse">
<lst name="defaults"></lst>
<str name="df">_text_</str>
</initparams>
The setting you need has to do with formdataUploadLimitInKB that is found in solrconfig.xml
 
If you take a look at your solrconfig.xml file, there is an area that sets that up.
 
Open up your opensolr index:
https://opensolr.com/admin/solr_manager/tools/INDEX_NAME
Go to the Config Files Editor tab
Select solrconfig.xml
Scroll down until you see a directive like:
There, you'll see the settings:
multipartUploadLimitInKB="2048000"
formdataUploadLimitInKB="2048"
So,  is what you needed to change in your own solrconfig.xml file in your indexes.
 
 
HOWEVER #1: If you need a ridiculous amount of boolean clauses (as solr calls them), you won't be able to do it, since solr will always return an error: too many boolean clauses
You can fix that, by increasing the parameter maxBooleanClauses in your solrconfig.xml
 
 
HOWEVER #2: For better query performance, you could use or think of different alternative techniques to your issue, such as maybe split your very large 1 query into multiple queries and aggregate the results in your own app.
 
For example:
 
Query 1: 
https://opensolr-server.solrcluster.com/solr/production/select?q=*:*&fq=field_name:(VAL1 OR ... OR VALn)
- where n is a number that is less than the maxBooleanClauses set in your solrconfig.xml
 
Query 2: 
https://opensolr-server.solrcluster.com/solr/production/select?q=*:*&fq=field_name:(VALn+1 OR ... OR VALm)
- where m is a number that is less than the maxBooleanClauses set in your solrconfig.xml
.
.
.
Query i... (you get the point)
 
You then merge the results from all those queries.
You can figure out the number of queries you have to make by dividing the number of OR clauses you need by the maxBooleanClauses setting in your solrconfig.xml
 
And, needless to say, if you have your data replicated on multiple servers behind a load balancer (AKA our Opensolr Resilient Cluster Solution),  your multiple queries would be load balanced, which in return results in much faster queries and better resiliency.

YES, however, it's only active in some servers right now.
Please ask us to install that, or any other plugin solr library, by following the guide here, and we'll be happy to set it up for you.

If you keep getting redirected to the Login page, or you are having troubles with placing a new order, after trying to login multiple times, please try to clear the opensolr cookies, or use a different browser.

   

You send it to support@opensolr.com and one of our tech people will install it in the server you're using (you'll need to specify the index name for us to be able to identify the server), within about 24 hours (usually just takes a couple of hours if the plugin is fully compatible, and all goes ok)

Make sure you send the .jar it's self, and would help if you also make sure that it's also compatible with the opensolr version you're currently using in your opensolr control panel.

Please don't send git repositories, as we won't build the .jar library on our end.

Sometimes, in the shared opensolr cloud, the data folder may get corrupted, so it can't be read from or written into.
One easy fix for this, is to simply remove your index, and then just create another one, preferably under another name.

If that doesn't work, please contact us, and we'll be happy to fix it up for you.

Also, keep in mind, that there may be more reasons, so please make sure to check your error log, by clicking the Error Log button inside your opensolr index control panel, and keep refreshing that page to make sure the errors you'll see are accurate.

If you do see errors in there, please email them to us, at support@opensolr.com and we'll fix it for you.

 

Click on the Tooks Menu Item on the right hand side, and then simply use the form to create your query and delete data.

To move from using the managed-schema to schema.xml, simply follow the steps below:

In your solrconfig.xml file, look for the schemaFactory definition.If you have one, remove it and add this instead:

<schemaFactory class="ClassicIndexSchemaFactory"/>

If you don't have it just add the above snippet somewhere above the requestHandlers definitions. 

 

To move from using the classic schema.xml in your opensolr index, to the managed-schema simply follow the steps below:

 

In your solrconfig.xml, look for a SchemaFactory definition, and replace it with this snippet:

<schemaFactory class="ManagedIndexSchemaFactory">
    <bool name="mutable">true</bool>
    <str name="managedSchemaResourceName">managed-schema</str>
  </schemaFactory>

 

If you don't have any schemaFactory definition, just paste the above snippet to your solrconfig.xml file, just about any requestHandler definition.

Opensolr now supports any Solr Version that may be required by your project.

Solr Versions provided by Opensolr.com

 

If you usually get an error, such as: Unknown field... Or Missing field, and your schema.xml already contains those fields, make sure you disable the Schemaless mode in solrconfig.xml

Just head on to the Config Files Editor in your opensolr index control panel, and locate a snippet that looks like this:

class="ManagedIndexSchemaFactory"

According to the solr documentation, you can disable the ManagedIndexSchemaFactory as per the instructions below:

To disable dynamic schema REST APIs, use the following for :

<schemaFactory class="ClassicIndexSchemaFactory"/>

Also do not forget to remove the entire snippet regarding the ManagedIndexSchemaFactory, so that you won't accidentally use both.

Yes, Opensolr now supports the  JTS Topology Suite, by default, which does not come bundled with the default Solr distribution.
It should be enabled in most of our servers and datacenters, however, if you feel that doesn't work for your index, please Contact Support and we'll be happy to enable it for you.
No further setup will be required on your part.

Please go to https://opensolr.com/pricing and make sure you select the analytics option from the extra features tab, when you upgrade your account. 

If you can see analytics but no data, make sure your solr queries are correctly formated in the form:
https://server.opensolr.com/solr/index_name/select?q=your_query&other_params... 

So, the search query must be clearly visible in the q parameter in order for it to show in analytics. 

Here are a few ways to save your Monthy alloted Bandwidth:

  1. Using local caching, such as memcache, can greatly reduce the actual requests to make to our solr servers, thus saving you bandwidth
  2.  Since the allocated Bandwidth is PER INDEX, you could setup Solr Replication and setup your local application to perform round-robin requests in all of your replicas. This way, the bandwidth is saved by ballancing it between the ressources of multiple indexes. For example, if your account has 1 Gb PER INDEX, you will create Index A and replicate it onto Index B, and make requests in both of those in a round-robin fashion, thus gaining 2 Gb total of bandwidth for your index.
  3. Make sure your solr queries will return as little data as possible. For example use the rows or fl parameters for the solr /select requests, to only return the records and the fields you really need. Any other data the gets returned will be counted as extra bandwidth.

OpenSolr is a cloud-based search service that provides hosting, management, and support for Apache Solr, a popular open-source search platform. Apache Solr is widely used for its powerful full-text search, hit highlighting, faceted search, dynamic clustering, and rich document handling capabilities. Here are some key aspects of OpenSolr services:

  1. Managed Solr Hosting: OpenSolr takes care of the setup, maintenance, and management of Solr instances. This includes handling server infrastructure, Solr installation, and configuration.

  2. Scalability and Performance: OpenSolr provides scalable solutions, allowing users to increase or decrease resources based on their requirements. This ensures efficient handling of varying search demands and data volumes.

  3. Data Security and Backups: OpenSolr offers secure hosting environments with features like SSL encryption, data backups, and recovery options to protect against data loss and ensure data integrity.

  4. Customizable Search Indexes: Users can create and customize their search indexes according to their specific needs. This includes defining schemas, setting up data import handlers, and configuring search parameters.

  5. User-friendly Interface: OpenSolr typically provides an easy-to-use control panel, allowing users to manage their Solr instances without deep technical knowledge. This includes features for index management, monitoring, and analytics.

  6. Support and Consultation: OpenSolr offers support services, including technical assistance and consulting, to help users optimize their Solr implementations for better search performance and reliability.

  7. Integration and API Support: OpenSolr services often include APIs and integration options, enabling seamless integration with various applications and data sources.

  8. Global Data Centers: To ensure fast and reliable access, OpenSolr may host its services in multiple data centers across the globe.

OpenSolr is suitable for businesses and developers who require robust search capabilities but want to avoid the complexities of self-hosting and managing Solr servers. It's often chosen for its scalability, reliability, and ease of use, particularly in applications like e-commerce platforms, content management systems, and data-intensive websites.

 
 

There are a couple things you might be able to do to trade performance for index size. For example, an integer (int) field uses less space than a trie integer (tint), but range queries will be slower when using an int.

To make major reductions in your index, you will almost certainly need to look more closely at the fields you are using.

  • Are you using a lot of stored fields? If so, try removing the stored fields from the index and query your database for the necessary data once you've got the results back from Solr.
  • Add omitNorms="true" to text fields that don't need length normalization
  • Add omitPositions="true" to text fields that don't require phrase matching
  • Special fields, like NGrams, can take up a lot of space
  • Are you removing stop words from text fields?

EZcmd.com is a useful set of GeoData and GeoIP utilities.

Here are a few screenshots

GEOIP Tools GEOIP Tools GEOIP Tools GEOIP Tools