Solr will often use quite a lot of RAM Memory, in order to build the search results response.
Therefore it is important that we follow a few Best Practices, in order to ensure that we do not overuse any resources that would not otherwise be needed.
It often happens that a dedicated Opensolr Environment, with quite an extensive amount of RAM Memory, won't be able to handle rather small Solr Indexes, because of the wrong implementation of certain Solr schema.xml configuration parameters, and other requests that will cause Solr to be killed by the Solr OOM script, when the Environment runs out of memory.
Also, Opensolr has a self-healing processes that will kick in for any crashed Solr process, recovering the Solr service in just under 1 minute.
Here are some Best Practices that you can use, to mitigate these effects:
- We have defined here, a few methods in which you can Save Transfer Bandwidth, which would in turn, also help with the Memory Management as well.
- Are you using Solr to return a very large number of documents in one request?
- That will cause Solr to allocate memory for all the data in your index, and then keep that cached, and reallocate new memory for each new query.
- The solution is to keep the &rows parameter to a value under 100, (&rows=100) as much as possible, and not tell Solr to return more data than it is necessary in each request.
- Are you paginating over a very large number of pges?
- Requesting Solr to return documents starting from a very high offset, will again cause it to allocate memory for all the data in the index, and have that cached in, which will, again, quickly exhaust the Envionment's RAM Memory.
- The solution is to try to only paginate to a reasonable amount of pages, as much as possible. In other words, keep the start parameter below 50,000 (&start=50000&rows=100)
- This also depends on the number of fields you have stored vs indexed fields, as, the more stored fields you have defined in your schema.xml the more RAM will be used for such high &start parameter, since Solr will have to allocate more memory for each field data.
- Are you using heavy Facetting, Sorting, Highlighting, or Group Queries?
- The solution is that all these, as a best practice, should be done on docValues=true fields.
- That is to say, that, in your schema.xml you should define the fields used for Facetting Sorting and Grouping, as docValues=true
- Example:
<field name="name" docValues="true" type="text_general" indexed="true" stored="true" />
- In some cases, defining more parameteres will be useful, especially when using highlighting on certain fields.
- Example:
<field name="description" type="text_general" indexed="true" stored="true" docValues="true" termVectors="true" termPositions="true" termOffsets="true" storeOffsetsWithPositions="true" />
- Field, Filter, Query and Document Caching
- In many cases, using such caching on your Solr index, will do more harm than good since those caches won't always be hit or the hit ratio will be very low, in which case memory will be filled with useless caches.
- The solrconfig.xml file, has multiple caching configurations that can be tweaked in order to ensure that caching is not overused:
- filterCache - storing unordered lists of document ids that have been returned by the “fq” (filterQuery) parameter of your queries.
- queryResultCache - stores document ids returned by searches
- documentCache - caches fieldValues that have been defined as “stored” in the schema, so that Solr does not have to go back to the index to fetch and return them for display.
- fieldCache - used to store all of the values for a field in memory rather than on to disk. For a large index, the fieldCache can occupy a lot of memory, especially if caching many fields.
- The solution is finding the definitions of these caches in your solrconfig.xml and setting them to a value as low as possible.
- Example:
<filterCache size="1" initialSize="1" autowarmCount="0"/>
- If using Drupal, make sure you update your Search API Solr module to the latest version, in order to fix this known bug.