Configuration

Opensolr Configuration — find answers to your questions

How to import data into Solr, from XML

Importing data from XML into Opensolr

If you were using Solr's DataImport Handler, starting with Solr 9.x that is no longer possible.
Here's how to write a small script that will import data into your Opensolr Index, from XML files:

#!/bin/bash
USERNAME="<OPENSOLR_INDEX_HTTP_AUTH_USERNAME>"
PASSWORD="<OPENSOLR_INDEX_HTTP_AUTH_PASSWORD>"

echo "Starting import on all indexes..." echo ""

echo "Importing: <YOUR_OPENSOLR_INDEX_NAME>" echo "Downloading the xml data file" wget -q <URL_TO_YOUR_XML_FILE>/<YOUR_XML_FILE_NAME> echo "Removing all data" curl -s -u $USERNAME:$PASSWORD "https://<YOUR_OPENSOLR_INDEX_HOSTNAME>/solr/<YOUR_OPENSOLR_INDEX_NAME>/update?commit=true&wt=json&indent=true" -H "Content-Type: text/xml" -d ":" echo "" echo "Uploading and Importing all data into <YOUR_OPENSOLR_INDEX_NAME>" curl -u $USERNAME:$PASSWORD "https://<YOUR_OPENSOLR_INDEX_HOSTNAME>/solr/<YOUR_OPENSOLR_INDEX_NAME>/update?commit=true&wt=json&indent=true" --progress-bar -H "Content-Type: text/xml" --data-binary @<YOUR_XML_FILE_NAME> | tee -a "/dev/null" ; test ${PIPESTATUS[0]} -eq 0 echo "" rm -rf <YOUR_XML_FILE_NAME> echo "Done!" echo "" echo "" echo ""

Now, the way this is made, is that if you have a minimal tech background, you can understand that everything within the <> brackets will have to be replaced with your Opensolr Index Name, your Opensolr Index Hostname, the URL for your XML file, and so forth. You can get all that info in your Opensolr Index Control Panel. Except for the URL to your XML file, which that is hosted somewhere on your end.

The way you format your XML file, is the classic Solr format.
This article may should show you more about the Solr XML Data File format.

Read Full Answer

Solr JVM Tuning RAM & Memory Management

🧠💥 Solr JVM Tuning RAM & Memory Management

Solr’s RAM appetite is legendary. Don’t worry, you’re not alone. Let’s help you keep your heap happy, your queries snappy, and your boss off your back.


🤔 Why Does Solr Use So Much Memory?

  • Search results: Returns tons of docs? RAM feast.
  • Caches: Four flavors, all with big appetites.
  • Big fields, bad configs, massive requests: Boom—there goes your heap.
  • Solr: “Give me RAM, and I shall give you… maybe some results.”

🛠️ Best Practices, in Style

1. Save Bandwidth, Save RAM

Fewer bytes → less RAM.
See our bandwidth tips.


2. Limit the rows Parameter!

Don’t return all the docs unless you want Solr to host a BBQ in your memory.

&rows=100

3. Paginate Responsibly

Huge start values = huge RAM usage.
Try not to cross start=50000 unless you really like chaos.


4. docValues or Bust

Faceting, sorting, grouping, highlighting:

<field name="my_field" docValues="true" type="string" indexed="true" stored="true"/>

5. Cache, but Not Like a Hoarder

Tighten up your caches in solrconfig.xml.

<filterCache size="1" initialSize="1" autowarmCount="0"/>

Monitor cache hit ratios; <10% = wasted RAM.


6. JVM Heap: Not a Dumpster, Not a Bathtub

  • Heap size:
    For most, 4g or 8g is enough.
    -Xms4g -Xmx4g
    
  • Garbage Collector:
    Use G1GC (modern, less “stop the world”).
    -XX:+UseG1GC
    
  • GC Tuning:
    For Solr 8+:
    -XX:+UseStringDeduplication
    -XX:MaxGCPauseMillis=200
    
  • Monitor:
    If your GC logs show frequent full GCs, it’s time to optimize.
    Enable GC logging for real insight:
    -Xlog:gc*:file=/var/solr/gc.log:time,uptime,level,tags:filecount=10,filesize=10M
    

7. Watch the Heap & GC

  • In Solr Admin UI, watch for heap >85% or long GC pauses.
  • If your server pauses for coffee breaks, that’s bad news.

8. Index Analytics & Log Watching

  • Use the Opensolr Analytics panel to see who/what is eating RAM.
  • Tail your logs and spot traffic spikes—don’t wait for support to call you.

9. Drupal + Solr = PATCH NOW

Keep Search API Solr current or face the wrath of bugs.


🎯 TL;DR Pro Tips

  • Limit rows and start.
  • Use docValues for anything you facet, sort, or group.
  • Cache like you’re paying rent by the megabyte.
  • Tune JVM heap and GC for your workload, not someone else’s.
  • Watch logs, heap, and GC stats.
  • Patch integrations, always.

🧑‍🔬 JVM Tuning Quick Reference

JVM Option What It Does Default/Example
-Xms / -Xmx Min/Max heap size -Xms4g -Xmx4g
-XX:+UseG1GC Use the G1 Garbage Collector Always for Java 8+
-XX:MaxGCPauseMillis=200 Target max GC pause time (ms) -XX:MaxGCPauseMillis=200
-XX:+UseStringDeduplication Remove duplicate strings in heap Java 8u20+
-Xlog:gc* GC logging See above
-XX:+HeapDumpOnOutOfMemoryError Write heap dump on OOM Always!
-XX:HeapDumpPath=/tmp/solr-heapdump.hprof Path for OOM heap dump Set to a safe disk

🤪 Meme Zone: Solr Memory Edition

Solr Heap Meme
“How many docs can I return? Solr: Yes.”


🤝 When to Call for Backup

  • Heap usage feels like the national debt
  • Solr restarts become your afternoon coffee break
  • JVM heap dumps are bigger than your backup drive

👉 Contact Opensolr Support — bring logs, configs, and memes. We love a challenge.

Read Full Answer

The Solr edismax query Parser. How it works and how it helps...

Solr EDisMax Query Parser Guide

Combining the best of DisMax and Extended DisMax with a dash of humor and a nod to traditional Solr wisdom.


Introduction

Solr’s EDisMax (Extended DisMax) query parser is the workhorse for modern search applications. It builds upon the classic DisMax parser, providing more flexibility, advanced features, and sharper control over scoring. In the days of yore, we manually crafted complex Solr queries with multiple TF-IDF tweaks; today, EDisMax handles much of that complexity, letting you focus on practical relevance tuning (and perhaps nostalgically remember those heady days of manual schema edits).

This guide dives deep into EDisMax’s most important parameters:

  • qf
  • mm
  • pf, pf1, pf2
  • ps, ps1, ps2

We’ll cover:

  1. Parameter Overviews: What each parameter does.
  2. Practical Schema Definitions: Example schema.xml configurations.
  3. Query Examples: How to call EDisMax in practice.
  4. Detailed Explanations: Behind-the-scenes of scoring and slop.
  5. Keeping q Clean: Using qf to avoid polluting your query string.

1. EDisMax Parameter Overviews

1.1 qf (Query Fields)

  • Purpose: Defines which fields EDisMax searches and at what boost weights.
  • Why It Matters: Separates “What the user typed” (q) from Solr-specific logic (boosts). Keep your q as pure user intent.

Syntax:

qf=field1^boost1 field2^boost2 ...
  • field1, field2: field names defined in schema.xml.
  • boost (float): relative weight for that field.

Example:

qf=title^2.0 description^1.0 content^0.5

1.2 mm (Minimum “Should” Match)

  • Purpose: Controls how many terms in a multi-term query must match in the targeted fields.
  • Why It Matters: Balances recall vs. precision. Too strict (mm=100%) → few/no results; too loose (mm=0%) → noisy results.

Syntax:

mm=<value>
  • Absolute: mm=2 (at least 2 terms must match)
  • Relative: mm=75% (75% of terms must match)
  • Combined: Supports “N of M” logic: mm=2<90% 5<100%

Example:

mm=2<75% 4<90% 6<100%

Interpretation: For queries up to 2 terms, require 75% (i.e., both terms). For up to 4, require 75%. For up to 6, require 90%. For more than 6, require 100%.


1.3 pf, pf1, pf2 (Phrase Fields)

EDisMax can detect phrase matches (adjacent tokens) and boost documents accordingly.

  • pf

    • Purpose: Apply a boost when the entire (multi-term) query appears as a contiguous phrase in the field.
    • Syntax: pf=fieldA^boostA fieldB^boostB ...
    • Usage: Best for multi-word queries (3+ terms). Combined with ps.
  • pf2

    • Purpose: Boost when any two-term phrase (bigram) from the query matches contiguously.
    • Syntax: pf2=fieldA^boostA fieldB^boostB ...
    • Usage: Useful to capture important two-word phrases even if the full phrase fails.
  • pf1

    • Purpose: Boost when a single-term query appears (helpful for one-word queries to maintain consistency).
    • Syntax: pf1=fieldA^boostA fieldB^boostB ...
    • Usage: For single-word queries; can mimic qf behavior but reserved for phrase logic.

1.4 ps, ps1, ps2 (Phrase Slops)

Phrase slop determines how far apart terms can be and still count as a phrase.

  • ps

    • Slop for pf fields.
    • Syntax: ps=<integer>
    • Example: ps=2 allows up to two token moves (e.g., “quick fox” matching “quick brown fox”).
  • ps2

    • Slop for pf2 (two-term phrases).
    • Syntax: ps2=<integer>
    • Example: ps2=1 — if query is “solar power”, it will still match “power of solar” if only one word moves.
  • ps1

    • Slop for pf1 (single-term).
    • Syntax: ps1=<integer> (though slop rarely matters for single-term).
    • Example: Use ps1=0 (exact match) or omit.

2. Practical schema.xml Definitions

Below is an example schema.xml snippet illustrating how to set up fields commonly used with EDisMax. In this fictional “ClassicBook” index, we have fields for title, author, summary, and content.

<!-- schema.xml excerpt for EDisMax example -->
<schema name="classicbook" version="1.6">
  <!-- Field Types -->
  <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <charFilter class="solr.HTMLStripCharFilterFactory"/>
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.ICUTokenizerFactory"/>
    <filter class="solr.CJKWidthFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" protected="protwords.txt" splitOnCaseChange="1" generateWordParts="1" preserveOriginal="1" catenateAll="1" catenateWords="1"/>
    <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
    <filter class="solr.SnowballPorterFilterFactory" protected="protwords.txt" language="English"/>
    <filter class="solr.LengthFilterFactory" min="1" max="500"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <charFilter class="solr.HTMLStripCharFilterFactory"/>
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.ICUTokenizerFactory"/>
    <filter class="solr.CJKWidthFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SynonymGraphFilterFactory" ignoreCase="true" synonyms="synonyms.txt" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" protected="protwords.txt" splitOnCaseChange="1" generateWordParts="1" preserveOriginal="1" catenateAll="1" catenateWords="1"/>
    <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
    <filter class="solr.SnowballPorterFilterFactory" protected="protwords.txt" language="English"/>
    <filter class="solr.LengthFilterFactory" min="1" max="500"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  </fieldType>

  <!-- Fields -->
  <field name="id" type="string" indexed="true" stored="true" required="true"/>
  <field name="title" type="text_general" indexed="true" stored="true" multiValued="false"/>
  <field name="author" type="string" indexed="true" stored="true"/>

  <field name="summary" type="text_general" indexed="true" stored="true" multiValued="false"/>
  <field name="content" type="text_general" indexed="true" stored="false" multiValued="false"/>

  <!-- Copy Field: Aggregate searchable text -->
  <field name="text_all" type="text_general" indexed="true" stored="false" multiValued="false"/>
  <copyField source="title"      dest="text_all"/>
  <copyField source="author"     dest="text_all"/>
  <copyField source="summary"    dest="text_all"/>
  <copyField source="content"    dest="text_all"/>

  <!-- Default Search Field -->
  <defaultSearchField>text_all</defaultSearchField>

  <!-- Unique Key -->
  <uniqueKey>id</uniqueKey>
</schema>

Notes on Fields:

  • text_all: Combined field with all searchable content. Use text_all in qf, pf, etc.
  • Field-Specific Boosting: We will boost title more than content because titles historically matter more to users.

3. Keeping q Clean with qf

A traditional Solr query might look like:

http://localhost:8983/solr/classicbook/select?q=title:"solar power"^2 summary:"solar power"^1

But that pollutes q with boost logic. Instead, use:

  • q: raw user text, e.g.: solar power
  • qf: field boosts
  • bq, bf: additional boosts (optionally)

Example (clean q):

q=solar power
&defType=edismax
&qf=title^3.0 summary^1.5 text_all^0.5
  • Explanation:
    • title matches count triple weight.
    • summary double weight.
    • text_all keeps the engine honest but low weight.

Your query string remains user-centric.


4. EDisMax in Action: Query Examples

Below are several search examples illustrating how EDisMax parameters affect results.

4.1 Basic Keyword Search with qf and mm

Request:

GET /solr/classicbook/select?
  q=ancient philosophy
  &defType=edismax
  &qf=title^2 summary^1 content^0.2
  &mm=75%
  &hl=true

What Happens:

  1. Tokenization: “ancient” and “philosophy”
  2. mm=75%:
    • For 2 terms, 75% rounds up → both terms must match (since 75% of 2 = 1.5 → 2 terms).
  3. Field Scoring:
    • Matches in title count double compared to summary.
    • Matches in content count minimal.

If a document has “ancient” in title but not “philosophy,” it is excluded (because both must match). Solr returns documents where both words appear, boosting those with title matches.


4.2 Phrase Boosting with pf and ps

Request:

GET /solr/classicbook/select?
  q=ancient philosophy treatise
  &defType=edismax
  &qf=title^2 summary^1 content^0.2
  &mm=2<75% 3<90% 4<100%
  &pf=title^5 summary^3
  &ps=2

What Happens:

  1. Term Matching:

    • mm=2<75%: For 3 terms, need at least 75% → 3 terms * 0.75 = 2.25 → 3 terms.
    • For 4 terms, need 90%.
    • Effect: Strict multi-term matching.
  2. Phrase Boost (pf):

    • If “ancient philosophy treatise” appears [with up to 2-word gaps (ps=2)] in title or summary, a significant boost is applied.
    • E.g., “ancient philosophical treatise” (one word in between) still qualifies.
  3. Scoring Order:

    • Exact phrase in title > phrase in summary > term matches alone.

4.3 Two-Term Phrase with pf2 and ps2

Suppose we want to capture strong two-word phrases:

GET /solr/classicbook/select?
  q=quantum mechanics equations
  &defType=edismax
  &qf=title^2 summary^1 content^0.2
  &mm=2<75% 3<90% 4<100%
  &pf2=title^4 summary^2
  &ps2=1

What Happens:

  • For any two-term phrase from “quantum mechanics equations” (e.g., “quantum mechanics”, “mechanics equations”):
    • If “quantum mechanics” appears with slop ≤ 1 in title, boost by 4.
    • In summary, boost by 2.

Thus, even if the full three-term phrase isn’t present, two-term pairs can surface important context.


4.4 One-Term Phrase with pf1 and ps1

For completeness:

GET /solr/classicbook/select?
  q=philosophy
  &defType=edismax
  &qf=title^2 summary^1 content^0.2
  &mm=1
  &pf1=title^3 summary^1
  &ps1=0
  • Single-term query “philosophy.”
  • pf1 boosts documents where “philosophy” appears in title (×3) or summary (×1).
  • ps1=0 means exact match; no slop needed.

5. Deep Dive into Parameters

5.1 qf: The Heart of Clean Queries

<!-- In solrconfig.xml (RequestHandler) -->
<requestHandler name="/select" class="solr.SearchHandler" default="true">
  <lst name="defaults">
    <str name="defType">edismax</str>
    <str name="qf">
      title^3.0
      summary^1.5
      text_all^0.5
    </str>
    <str name="mm">2&lt;75% 4&lt;90% 6&lt;100%</str>
    <str name="pf">title^4 summary^2</str>
    <str name="ps">2</str>
    <str name="pf2">title^3 summary^1</str>
    <str name="ps2">1</str>
    <str name="pf1">title^2 summary^1</str>
    <str name="ps1">0</str>
  </lst>
</requestHandler>
  • Why Defaults?
    Embedding your qf and mm in defaults ensures consistency across all calls. You can override at query time if needed.

  • Traditional Tip:
    “Once you set sensible default boosts, you save yourself countless hours of tweaking individual requests.”

5.2 mm: Balancing Recall vs. Precision

  • Absolute vs. Percentage:

    • Use absolute (mm=2) for short queries (2–3 words).
    • Use relative (mm=50%) for longer queries.
  • Composite Syntax:

    • mm=2&lt;75% 4&lt;90% 6&lt;100%:
      • Up to 2 terms → 75% → 2 terms must match.
      • 3–4 terms → at least 3 matches.
      • 5–6 terms → at least 5 matches.
      • 7+ terms → 100% matches (strict).

Pro Tip (Traditional Wisdom): If you haven’t set mm and rely on default fuzzy matching, you might end up with the dreaded “too many results” syndrome.


5.3 pf, pf1, pf2: Phrase Boosting Strategies

  • Why Phrase Boost?
    Users often type queries that imply an exact phrase (“sherlock holmes stories”). Reward documents that honor the phrase. It’s like giving a tip to the bartender for a well-made drink—acknowledge exactly what was asked.

  • pf (General Phrase Boost):

    • Targets the entire query sequence.
    • Combined with ps to allow “wiggle room” (slop).
  • pf2 (Bigram Boost):

    • Useful when the full phrase fails or user typed a longer phrase.
    • Captures strong two-word signals.
  • pf1 (Unigram Boost):

    • Useful for consistency in single-word queries.
    • Ensures short queries also benefit from phrase logic.
  • Fine-Tuning Slop (ps, ps2, ps1):

    • ps=2: Up to 2-term gaps in phrase.
    • ps2=1: Up to 1-term gap in bigrams.
    • ps1=0: No gap for single-term (exact).

5.4 Putting It All Together: Sample solrconfig.xml

<config>
  <requestHandler name="/select" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="defType">edismax</str>

      <!-- Core Query Fields -->
      <str name="qf">
        title^3.0
        summary^1.5
        text_all^0.5
      </str>

      <!-- Minimum “Should” Match -->
      <str name="mm">2&lt;75% 4&lt;90% 6&lt;100%</str>

      <!-- Phrase Boosts -->
      <str name="pf">title^5 summary^3</str>
      <str name="ps">2</str>
      <str name="pf2">title^4 summary^2</str>
      <str name="ps2">1</str>
      <str name="pf1">title^3 summary^1</str>
      <str name="ps1">0</str>

      <!-- Highlighting Defaults -->
      <str name="hl">true</str>
      <str name="hl.fl">title,summary,content</str>
    </lst>
  </requestHandler>
</config>

Explanation:

  1. qf: Splits search across fields, boosting title most.
  2. mm: Balances how many terms must match.
  3. pf: Big phrase boost to reward exact (or near-exact) matches.
  4. pf2: Two-word phrase boost, capturing key bigrams.
  5. pf1: Single-term phrase boost, ensuring one-word queries still get a leg up.
  6. ps, ps2, ps1: Slop controls—allowing some wiggle but not too much.

6. Advanced Examples & Scenarios

6.1 Long Queries with pf2 and pf

Scenario:
A user queries: “quantum field theory experiments at low temperatures”.

GET /solr/classicbook/select?
  q=quantum field theory experiments at low temperatures
  &defType=edismax
  &qf=title^2 summary^1 text_all^0.3
  &mm=4&lt;75% 7&lt;90% 10&lt;100%
  &pf=title^5 summary^3
  &ps=2
  &pf2=title^4 summary^2
  &ps2=1
  • Term Count: 7 terms.
  • mm: of 7 terms, require 90% → 7 × 0.9 = 6.3 → 7 terms. (Strict)
  • Phrase Logic:
    • If “quantum field theory experiments” appears (with ≤ 2-term slop) in title, massive boost.
    • Even if that fails, any two-word pairs like “quantum field” or “field theory” get a smaller boost.

Vintage Wisdom: Back in the day, we hand-crafted this logic with multiple q clauses. Now, EDisMax elegantly bundles it.


6.2 Query-Time Overrides

Though defaults serve most cases, you can override parameters on the fly:

GET /solr/classicbook/select?
  q=renaissance art paintings
  &defType=edismax
  &qf=title^4 summary^2 text_all^0.4
  &pf=title^6 summary^3
  &ps=3
  &mm=2&lt;80% 5&lt;90% 8&lt;100%
  &rows=20
  &sort=score desc, publish_date desc
  • Query-Specific Boosts: Bump title further for “art”-centric queries.
  • Phrase Slop: Increased to 3 to allow permutations like “art of renaissance paintings”.
  • Sorting: Traditional “score then date” ordering.

7. Best Practices and Traditional Tips

  1. Set Sensible Defaults:

    • Tune qf and mm in your solrconfig.xml defaults. Overrides are the exception, not the rule.
  2. Phrase Boosting Wisdom:

    • Start with pf + ps=2. See if that yields decent results. Add pf2 if long queries are common.
    • Resist the urge to crank ps too high—phrase logic gets too loose.
  3. Keep q Pure:

    • Always let q be user-provided text. Place boosting/logic in separate parameters like qf, bq, bf.
  4. Iterate with Analytics:

    • Review Solr’s query logs. Are users often getting no results? Adjust mm.
    • Are phrase matches failing? Tweak ps or add pf2.
  5. Legacy Reminder:

    • If you remember basic Lucene queries (+title:(ancient philosophy) +summary:(ancient philosophy)), EDisMax is your elegant shortcut. Honor the old ways by acknowledging them, but embrace convenience.

8. Summary

EDisMax streamlines complex boolean and phrase-scoring logic into a cohesive, user-friendly API. By mastering:

  • qf: Keep your query field mappings clear and boost weights sensible.
  • mm: Balance recall and precision with thoughtful minimum-match.
  • pf, pf1, pf2, ps, ps1, ps2: Elevate phrase relevance without over-engineering.

…you retain the “traditional Solr craftsmanship” while leveraging modern conveniences. May your search results be ever relevant, your phrase boosts well-balanced, and your schema forever flexible.


Written with respect for the traditions of Solr and a wink to the days of manual Lucene queries. Enjoy!

Read Full Answer

Autocomplete in Solr and EdgeNGram Fields - the eCommerce he...

Solr edgy_text and edgeNgram Explanation

1. edgy_text Field Type:

The edgy_text field type typically uses Edge N-gram tokenization to create prefix-based tokens for indexing. This means the field is indexed in a way that allows matching partial terms or phrases, improving search results, especially for autocomplete or search-as-you-type functionality.

What is Edge N-gram Tokenization?

  • Edge N-grams are tokens that represent progressively longer prefixes of a word. For example, if you tokenize the word "example", the Edge N-gram tokenizer could produce tokens like:
    • "e"
    • "ex"
    • "exa"
    • "exam"
    • "examp"
    • "exampl"
    • "example"

This is useful for autocompletion, as it allows users to search by just typing the beginning of a word (or a term) and get matches based on the indexed prefixes.

How edgy_text works in Solr:

The edgy_text field type in Solr is usually implemented using the Edge N-gram filter. This filter generates prefixes of the input text and indexes those prefixes. It works well for fields where you want to allow prefix-based matching or autocomplete-like behavior.

2. edgeNgram Field Type:

The edgeNgram filter, when used with a Solr field type, can be applied to a field to tokenize the input data (like product names, categories, or SKUs) into progressively longer prefixes of words.

For Example:

If a field like product_name contains "laptop", the edgeNgram filter might produce the following tokens:

  • "l"
  • "la"
  • "lap"
  • "lapt"
  • "lapto"
  • "exampl"
  • "laptop"

This allows a search query like "lap" to match "laptop", making it perfect for search-as-you-type or autocomplete features.

Example of Edge Ngram in Solr:

Here's how you can configure both edgy_text and edgeNgram in Solr:

1. edgy_text FieldType Definition:

This edgy_text field type uses the Edge N-gram filter to break words into progressively longer prefixes.

<fieldType name="edgy_text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <!-- Whitespace tokenizer to split the text into tokens based on spaces -->
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <!-- Edge N-gram filter to create prefixes from each word -->
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <!-- Similar tokenization for query analysis -->
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>
  • minGramSize="1": Specifies that the smallest prefix (starting point) will be of length 1 (e.g., "l" from "laptop").
  • maxGramSize="25": Limits the length of the token to a maximum of 25 characters. You can adjust this based on your needs.
2. Using edgy_text for Fields:

Once you've defined this field type, you can apply it to any field where you want to enable prefix matching. For instance, for a product name or SKU:

<field name="product_name" type="edgy_text" indexed="true" stored="true"/>
<field name="sku" type="edgy_text" indexed="true" stored="true"/>

These fields now support partial matching for searches, such as a query for "lap" returning "laptop".

3. Benefits:
  • Prefix matching: Users can search by typing partial words, and Solr will return all results that match the prefix.
  • Autocomplete: This can be used to create search-as-you-type features in your application, where suggestions appear as users type.
  • Efficient search: Since Solr indexes only prefixes, it can efficiently match terms, especially with large datasets.
4. Use Case:

Consider an e-commerce site where users are typing product names in a search box. As the user types the prefix of the product name (e.g., "lap" for "laptop"), Solr can quickly return relevant results because the field is indexed with all possible prefixes of the product name. This results in faster search suggestions and matches.

Conclusion:

  • edgy_text with Edge N-gram: This combination helps create an efficient, powerful search feature for prefix-based matches, such as autocompletion.
  • Use Cases: Ideal for search-as-you-type functionality, improving user experience by providing real-time search suggestions or partial matches.
Read Full Answer

Best Fulltext Solr Fields

How to Calibrate Solr Search for Optimal Results with text_general and text_general_phonetic Fields

Fine-tuning your Solr search queries is essential for getting the most relevant results. By using the text_general and text_general_phonetic field types, you can significantly enhance the quality of your search output.

Solr Field Types: text_general & text_general_phonetic

text_general Field

This field type is designed to handle general text content, such as titles, descriptions, and body text. It provides a powerful set of analyzers:

  • HTML Stripping: Strips HTML tags from the content to clean it up.
  • Unicode Tokenization: Breaks the text into tokens using an ICU tokenizer.
  • Word Delimiters: Handles word and number splitting to ensure proper tokenization.
  • Stopword Removal: Removes common stopwords from the text for more relevant results.
  • Stemming: Reduces words to their root form, making searches more accurate.

text_general_phonetic Field

The text_general_phonetic field type is perfect for phonetic searches, where sound-based similarity matters. It is similar to text_general, but with added functionality for phonetic matching:

  • Beider-Morse Phonetic Encoding: Encodes words based on their sound, allowing for sound-based matching.
  • Word Delimiters & Tokenization: Ensures that tokens are split appropriately for phonetic coding.
  • Synonym Support: Can expand synonyms during query time, improving search flexibility.
  • Accents Handling: Accents are preserved for better phonetic matching.

Optimizing Search Parameters for Best Results

Once you have your field types set up, the next step is to calibrate the search parameters for optimal results. Solr provides several parameters that allow you to adjust your search behavior, including mm, qf, and bf.

Understanding the Parameters

  • mm (Minimum Should Match): This parameter defines the minimum percentage of terms in the query that must match. It's essential to adjust this based on the number of words in the query to balance precision and recall.
  • qf (Query Fields): Defines which fields to query and assigns boosting factors. The more relevant fields are boosted higher for better precision.
  • bf (Boost Functions): Allows you to boost the results based on specific functions, such as freshness or recency of documents.

Example: Query Parameter Calibration

Here's an example of how to calibrate the parameters:

params["qf"] = "title^10 description^7 text^5 phonetic_title^0.3 phonetic_description^0.2 phonetic_text^0.1";
params["mm"] = "75%";
params["bf"] = "recip(rord(timestamp),1,1500,1500)^29";

In this example, the query boosts the title field heavily and adjusts the match percentage to 75%. Additionally, the bf function boosts more recent documents.

Field Definitions and Example Queries

Here are the actual field definitions for the text_general and text_general_phonetic field types:

  <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.ICUTokenizerFactory"/>
        <filter class="solr.CJKWidthFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" protected="protwords.txt" splitOnCaseChange="1" generateWordParts="1" preserveOriginal="1" catenateAll="1" catenateWords="1"/>
        <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
        <filter class="solr.SnowballPorterFilterFactory" protected="protwords.txt" language="English"/>
        <filter class="solr.LengthFilterFactory" min="1" max="500"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.ICUTokenizerFactory"/>
        <filter class="solr.CJKWidthFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SynonymGraphFilterFactory" ignoreCase="true" synonyms="synonyms.txt" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" protected="protwords.txt" splitOnCaseChange="1" generateWordParts="1" preserveOriginal="1" catenateAll="1" catenateWords="1"/>
        <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
        <filter class="solr.SnowballPorterFilterFactory" protected="protwords.txt" language="English"/>
        <filter class="solr.LengthFilterFactory" min="1" max="500"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
  </fieldType>
    <!--Phonetic Text Field-->
    <fieldType name="text_general_phonetic" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <!-- 1) Strip HTML -->
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
    
        <!-- 2) Tokenize on Unicode word boundaries rather than bare whitespace -->
        <!--    WhitespaceTokenizer will treat “Co‐op” as one token, but you probably want “Co” + “op”. -->
        <tokenizer class="solr.ICUTokenizerFactory"/>
    
        <!-- 3) Remove stopwords early on -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    
        <!-- 4) Break apart numbers/words but keep the original spelling for phonetic coding -->
        <filter class="solr.WordDelimiterGraphFilterFactory"
                generateWordParts="1" splitOnNumerics="1" splitOnCaseChange="1"
                generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"
                preserveOriginal="1"
                protected="protwords.txt" />
    
        <!-- 5) Lowercase now so phonetic sees normalized input -->
        <filter class="solr.LowerCaseFilterFactory"/>
    
        
        
        <filter class="solr.PorterStemFilterFactory"/>
    
    
        <!-- 7) Fold accents (but keep originals so BeiderMorse sees both accented & un-accented) -->
        <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
    
        <!-- 8) Synonyms (optional—but note: synonyms + phonetics = explosion of tokens) -->
        <filter class="solr.SynonymFilterFactory"
                synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
    
        <!-- 9) Phonetic coding: only keep **one** code per token if possible -->
        <!--    nameType="GENERIC" ruleType="APPROX" is fine, but “concat=true” will glue codes together. -->
        <!--    For better control, set concat="false" so each code is its own token. -->
        <filter class="solr.BeiderMorseFilterFactory"
                nameType="GENERIC"
                ruleType="APPROX"
                concat="false"
                languageSet="auto"/>
      </analyzer>
    
      <analyzer type="query">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.ICUTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterGraphFilterFactory"
                generateWordParts="1" splitOnNumerics="1" splitOnCaseChange="1"
                generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"
                preserveOriginal="1"
                protected="protwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
        <filter class="solr.SynonymFilterFactory"
                synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
        <filter class="solr.BeiderMorseFilterFactory"
                nameType="GENERIC"
                ruleType="APPROX"
                concat="false"
                languageSet="auto"/>
      </analyzer>
    </fieldType>

Conclusion Calibrating Solr’s search parameters for specific field types like text_general and text_general_phonetic ensures that you’re getting the most relevant results from your searches. By adjusting key parameters like mm, qf, and bf, you can refine your search queries and achieve optimal performance tailored to your needs.

Read Full Answer

How to Define New Fields in schema.xml

Overview

The schema.xml file defines the structure of your Solr index — it tells Solr what fields exist, what type of data they hold, and how they should be analyzed during indexing and search. Understanding how to add and configure fields is essential for any Solr implementation.

Where to Edit schema.xml

In your Opensolr account, navigate to your index management area at:

https://opensolr.com/admin/solr_manager/tools/YOUR_INDEX_NAME

Click on the Edit schema.xml tab to open the schema editor.

Defining a New Field

Fields are defined inside the <fields> section of schema.xml. To add a new field, insert a <field> element with the appropriate attributes:

<field name="full_name" type="text_general" indexed="true" stored="true" />

Field Attributes Explained

  • name — The field name, used when indexing and querying documents.
  • type — The field type, which determines how data is analyzed and stored (see below).
  • indexed="true" — The field is searchable. Set to false if you only need to retrieve the value.
  • stored="true" — The original value is stored and can be returned in search results.
  • multiValued="true" — The field can hold multiple values (e.g., tags).
  • required="true" — A document cannot be indexed without this field (typically only used for the id field).

Common Field Types

TypeDescriptionUse Case
text_generalTokenized text with standard analysisFull-text search fields (titles, descriptions)
stringExact, untokenized stringIDs, categories, facets, filters
int / pintInteger valueCounts, quantities
tintTrie-based integer (legacy)Range queries on integers
tfloat / tdoubleTrie-based decimal numbersPrices, measurements, range queries
booleanTrue/false valueFlags, toggles
date / pdateDate/time in ISO 8601 formatTimestamps, date filtering

Example: Range Queries with Numeric Fields

If you define an integer field using a trie type, you can perform range queries. For example:

<field name="age" type="tint" indexed="true" stored="true" />

You can then query for all documents where age is greater than 13:

https://YOUR_SOLR_HOST/solr/YOUR_INDEX/select?q=age:[13 TO *]

The same principle applies to tfloat and tdouble fields for decimal range queries, such as filtering products by price.

Dynamic Fields

Your schema.xml also contains dynamic field definitions that use wildcard patterns. These automatically match any field name that fits the pattern:

<dynamicField name="*_s" type="string" indexed="true" stored="true" />
<dynamicField name="*_i" type="int" indexed="true" stored="true" />
<dynamicField name="*_t" type="text_general" indexed="true" stored="true" />

Dynamic fields let you index data without explicitly defining every field. For example, a field named color_s would automatically be treated as a string. You can leave these in place or remove them — they do not affect your explicitly defined fields.

Further Reading

Read Full Answer

How to Setup Solr MySQL Data Import Handler (DIH)

Important Notice: DIH Removed in Solr 9+

The Data Import Handler (DIH) module has been removed from Apache Solr starting with version 9. If you are using Solr 9 or later, DIH is no longer available and you will need to use alternative data ingestion methods such as the Solr REST API, SolrJ, or tools like Apache NiFi.

Note that Solr versions 3, 4, 5, 6, 7, and 8 have all reached End of Life (EoL) per the Apache Software Foundation. However, Opensolr is the only hosting provider that still fully supports all of those legacy versions for customers who rely on them — including full DIH support.

What Is the Data Import Handler?

The Solr Data Import Handler (DIH) allows you to import data directly from a relational database (such as MySQL, PostgreSQL, or Oracle) into your Solr index. Instead of writing your own import scripts, you configure an XML file that tells Solr how to connect to your database and which SQL query to run.

Step 1: Edit Your db-data-config.xml

In your Opensolr Control Panel, navigate to your index management area. You will find the option to edit the db-data-config.xml file. This file defines your database connection and the query used to fetch data.

Here is an example configuration:

<dataConfig>
  <dataSource type="JdbcDataSource"
              driver="com.mysql.jdbc.Driver"
              url="jdbc:mysql://YOUR_DB_HOST/YOUR_DB_NAME"
              user="your-username"
              password="your-password" />
  <document>
    <entity name="id"
            query="SELECT id, name, description FROM your_table">
    </entity>
  </document>
</dataConfig>

Step 2: Ensure Your Schema Matches

The fields returned by your SQL query must be defined in your schema.xml. For example, if your query returns id, name, and description, your schema.xml must have corresponding field definitions:

<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="text_general" indexed="true" stored="true" />
<field name="description" type="text_general" indexed="true" stored="true" />

Step 3: Run the Import

Once configured, you can trigger a full import from the Opensolr Control Panel or via the DIH endpoint:

curl "https://YOUR_SOLR_HOST/solr/YOUR_INDEX/dataimport?command=full-import"

Video Tutorial

Watch this step-by-step demonstration of configuring the MySQL Data Import Handler on Opensolr:

Alternatives for Solr 9+

Since DIH is not available in Solr 9 and later, consider these alternatives for importing data from MySQL:

  • Solr REST API — Send JSON or XML documents directly via HTTP POST to the /update endpoint.
  • CSV Import — Export your MySQL data as CSV and upload it to Solr using the /update/csv handler.
  • Client Libraries — Use libraries like SolrJ (Java), Solarium (PHP), or pysolr (Python) to programmatically index data.
  • Apache NiFi — A powerful data flow tool that can pull from MySQL and push to Solr.
Read Full Answer

Upload and Import Data from CSV to Your Opensolr Index

Overview

Solr has built-in support for importing data from CSV files via the /update/csv request handler. This is one of the easiest ways to bulk-load data into your Opensolr index without writing any code.

Video Tutorial

Watch this demonstration of uploading CSV data to your Opensolr index:

Importing CSV via curl

You can upload a CSV file directly from the command line using curl:

curl "https://YOUR_SOLR_HOST/solr/YOUR_INDEX/update/csv?commit=true" \
  --data-binary @your_data.csv \
  -H "Content-Type: application/csv"

Common Parameters

  • commit=true — Commits the data immediately after import so it becomes searchable.
  • separator=%09 — Use a tab separator instead of comma (for TSV files).
  • skip=unwanted_column — Skip a column that exists in your CSV but is not in your schema.
  • fieldnames=id,title,description — Override column names if your CSV header does not match your schema field names.
  • overwrite=true — Update existing documents if the unique key (id) already exists.

Auto-Generating Unique IDs

If your CSV file does not have a unique ID column, you can configure Solr to auto-generate unique IDs. Add this updateRequestProcessorChain to your solrconfig.xml:

<updateRequestProcessorChain>
  <processor class="solr.UUIDUpdateProcessorFactory">
    <str name="fieldName">id</str>
  </processor>
  <processor class="solr.LogUpdateProcessorFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

Add this snippet right below one of the </searchComponent> closing tags in your solrconfig.xml file. With this in place, Solr will automatically assign a UUID to each imported row.

Preparing Your CSV File

  • The first row must contain column headers that match your schema.xml field names.
  • Use UTF-8 encoding for proper character support.
  • Ensure your data does not contain unescaped commas within fields — use double quotes around fields that contain commas.
  • Remove any BOM (Byte Order Mark) that some editors add to the beginning of CSV files.

Troubleshooting Common Issues

  • "Unknown field" error — Your CSV header contains a field name that is not defined in schema.xml. Either add the field to your schema or use the skip parameter.
  • "Document is missing mandatory uniqueKey field: id" — Your CSV does not have an id column. Either add one or configure the UUID processor above.
  • Data not appearing in search — Make sure you include commit=true in your request, or send a separate commit command after uploading.
Read Full Answer

Solr Configuration Files Dependency Flow

📦 How to Upload Solr Configuration Files (Like a Pro!)

Solr thrives on configuration files—each with its own special job.
Whether you're running a classic Solr install, a CMS like Drupal, or even going rogue with WordPress and WPSOLR, proper configuration is key.


🤓 Why Does the Order Matter?

Solr configurations often reference each other (think: dependencies). If you upload them in the wrong order, you’ll get errors, failed indexes, and possibly even a mild existential crisis.


🚦 The "Three Archive" Method (aka Solr Zen)

When uploading your Solr config files via the Opensolr Index Control Panel, follow this foolproof order:

  1. Dependencies First!
    Create and upload a .zip containing all dependency files (such as .txt files, schema-extra.xml, solrconfig-extra.xml, synonyms, stopwords, etc).
    Basically, everything except the main schema.xml and solrconfig.xml.

  2. Schema Second!
    Zip and upload just your schema.xml file.
    This file defines all fields and refers to resources from the previous archive.

  3. solrconfig Last!
    Finally, zip and upload your solrconfig.xml file.
    This references your schema fields and ties all the magic together.

In summary:
1️⃣ Dependencies → 2️⃣ schema.xml → 3️⃣ solrconfig.xml


⚡️ Can I Automate This?

Absolutely!
Use the Opensolr Automation REST API to upload your config files programmatically.
Because, let’s face it, real wizards script things.


📝 Pro Tips

  • Always double-check references between config files!
  • If you're using a CMS, look for community best practices on managing Solr configs.
  • Feeling unsure? Upload one at a time, in the order above, and test after each.

Now go forth and upload with confidence! 🦾

Read Full Answer

Support for auto phrase tokenfilter jar

🧩 Using the AutoPhrase TokenFilter JAR in Opensolr

The AutoPhrase TokenFilter is a powerful Solr plugin that helps you recognize and index multi-word expressions as single tokens (think: "New York City" as one unit, not three). This can significantly improve the quality of search, autocomplete, and analytics.


⚡️ Is It Enabled by Default?

Not on all Opensolr environments!
If you're trying to use the AutoPhraseTokenFilterFactory and see errors like:

Plugin not found: solr.AutoPhraseTokenFilterFactory

...then the jar isn't active on your server (yet).


🛠️ What To Do?

  1. Contact Us
    Simply send us a request and we'll install the AutoPhrase library (or pretty much any other custom Solr plugin) for you.

  2. How to Request a Plugin

    • Follow the step-by-step guide: How do I add a lib/jar file?
    • Let us know which version of Solr you're using (the right jar version matters!).
    • Optionally, send the JAR file directly if it's a custom or non-public library.
  3. After Installation

    • Once the plugin is in place, add the appropriate <filter class="solr.AutoPhraseTokenFilterFactory" ... /> element to your field type in schema.xml.
    • Reload your core to activate the new filter.
    • Don't forget to update your schema or config if needed—AutoPhrase sometimes requires its own config files or phrase lists.

🚨 Gotchas & Tips

  • Version Compatibility: Always use a plugin version that matches your Solr version.
  • Security: Opensolr reviews all uploaded JARs for security reasons—public/official plugins are easier/faster to approve!
  • Performance: Heavy custom token filters (like AutoPhrase) can impact indexing speed. Test with your real data!

🔍 Learn More


Questions? Contact Opensolr Support — we’re happy to help!

(If you’re a plugin power user, give us a heads up and we’ll have your Solr instance doing backflips in no time. 🕺)

Read Full Answer

How do I add a lib jar file

🏗️ Using Custom JAR Libraries in Opensolr

Need a special Solr plugin or custom filter?
No problem! Opensolr supports custom JAR libraries—so you can fine-tune your search platform with advanced features.


🚚 How to Install a Custom JAR Library

  1. Send Us Your JAR
    Email your custom JAR file (or a link to the official plugin page where binaries are already compiled) to support@opensolr.com.

  2. Include This Info

    • Your Opensolr Registration Email Address
    • The Opensolr Index Name (where you want the plugin installed)
  3. Installation Timeline

    • Most installations are done within a couple of hours (we say "up to 24 hours" to cover rare edge cases and to sound like responsible adults).
    • If the plugin is fully compatible with your Solr version, it's usually lightning fast!

🛡️ Pro Tips for Success

  • Send the JAR File Itself
    Don’t just send the source code. We need the compiled .jar binary!
  • Official Sources Are Best
    For security and speed, send links to official or reputable plugin pages.
  • Version Match Matters
    Double-check that your JAR matches your Solr version—otherwise it might throw errors (or, even worse, not work at all).

🔄 After Installation

Once we've installed the plugin:

  • Update your schema.xml or solrconfig.xml to use your new library (we can help with this if needed).
  • Reload your Solr core to activate the changes.
  • Test your configuration—give it a spin!

Questions? Stuck?
Email support@opensolr.com and our tech team will leap into action (well, at least open their laptops and get right on it).


With Opensolr, you’re never stuck with just the basics. Power up your index—your way! ⚡️

Read Full Answer

Move from managed-schema to schema.xml

To move from using the managed-schema to schema.xml, simply follow the steps below:

In your solrconfig.xml file, look for the schemaFactory definition.If you have one, remove it and add this instead:

<schemaFactory class="ClassicIndexSchemaFactory"/>

If you don't have it just add the above snippet somewhere above the requestHandlers definitions. 

 
Read Full Answer

Move from schema.xml to managed-schema

To move from using the classic schema.xml in your opensolr index, to the managed-schema simply follow the steps below:

In your solrconfig.xml, look for a SchemaFactory definition, and replace it with this snippet:

   <schemaFactory class="ManagedIndexSchemaFactory">
      <bool name="mutable">true</bool>
      <str name="managedSchemaResourceName">managed-schema</str>
   </schemaFactory>

If you don't have any schemaFactory definition, just paste the above snippet to your solrconfig.xml file, just about any requestHandler definition.

Why Move to Managed-Schema?

There are several reasons you might want to switch from the classic schema.xml approach to managed-schema:

  • Schema API Support: Managed-schema enables the Solr Schema API, which allows you to add, modify, and delete fields via REST endpoints without manually editing XML files.
  • Dynamic Field Management: Applications like Drupal's Search API Solr module rely on the Schema API to automatically configure fields. If your integration requires it, managed-schema is the way to go.
  • Programmatic Schema Changes: If you need to update your schema from code (CI/CD pipelines, deployment scripts), the Schema API makes this straightforward with simple HTTP requests.

Step-by-Step Migration Instructions

  1. Open the Config Files Editor in your Opensolr Index control panel.
  2. Edit solrconfig.xml and locate the schemaFactory section. If you see ClassicIndexSchemaFactory, replace it with the ManagedIndexSchemaFactory snippet shown above. If there is no schemaFactory section at all, add the snippet just before any <requestHandler> definition.
  3. Save the file in the Config Files Editor.
  4. Reload your Opensolr Index by clicking the Reload button in the control panel.
  5. After the reload, Solr will automatically read your existing schema.xml and create a new managed-schema file from it. Your original schema.xml is preserved but will no longer be used.

What Changes in solrconfig.xml

The key change is the schemaFactory class. Here is a comparison:

Classic mode (schema.xml):

<schemaFactory class="ClassicIndexSchemaFactory"/>

Managed mode (managed-schema):

<schemaFactory class="ManagedIndexSchemaFactory">
  <bool name="mutable">true</bool>
  <str name="managedSchemaResourceName">managed-schema</str>
</schemaFactory>

Setting mutable to true allows the Schema API to modify the schema at runtime. If you set it to false, the managed-schema is read-only and the Schema API will reject modification requests.

Common Pitfalls

  • Leaving both schemaFactory definitions: If your solrconfig.xml contains both ClassicIndexSchemaFactory and ManagedIndexSchemaFactory, Solr will fail to start. Make sure you remove the old definition entirely before adding the new one.
  • Editing managed-schema manually: Once you switch to managed mode, avoid editing the managed-schema file directly through the Config Files Editor. Use the Schema API instead. Manual edits can be overwritten by Solr on reload.
  • Forgetting to reload: The migration only happens when the index is reloaded after the solrconfig.xml change. If you just save without reloading, nothing changes.
  • Schema.xml syntax errors: If your existing schema.xml has syntax errors, the migration will fail on reload. Check the Error Log and fix any issues in schema.xml first, then switch the schemaFactory.

How to Verify the Migration Worked

  1. After reloading, check the Error Log in your Opensolr control panel. A clean reload with no errors means the migration succeeded.
  2. Try making a Schema API call to list your fields. You can do this with a simple cURL command:
    curl "https://YOUR_HOSTNAME/solr/YOUR_CORE/schema/fields"
    If you get a JSON response listing your fields, the Schema API is active and working.
  3. Verify that your existing fields and field types are intact by checking the response from the Schema API against your original schema.xml definitions.

If you ever need to switch back to classic schema.xml mode, see our FAQ on fixing schema.xml not taking effect, which explains how to restore the ClassicIndexSchemaFactory.

Read Full Answer