The Solr edismax query Parser. How it works and how it helps.

Configuration

Solr EDisMax Query Parser Guide

Combining the best of DisMax and Extended DisMax with a dash of humor and a nod to traditional Solr wisdom.


Introduction

Solr’s EDisMax (Extended DisMax) query parser is the workhorse for modern search applications. It builds upon the classic DisMax parser, providing more flexibility, advanced features, and sharper control over scoring. In the days of yore, we manually crafted complex Solr queries with multiple TF-IDF tweaks; today, EDisMax handles much of that complexity, letting you focus on practical relevance tuning (and perhaps nostalgically remember those heady days of manual schema edits).

This guide dives deep into EDisMax’s most important parameters:

  • qf
  • mm
  • pf, pf1, pf2
  • ps, ps1, ps2

We’ll cover:

  1. Parameter Overviews: What each parameter does.
  2. Practical Schema Definitions: Example schema.xml configurations.
  3. Query Examples: How to call EDisMax in practice.
  4. Detailed Explanations: Behind-the-scenes of scoring and slop.
  5. Keeping q Clean: Using qf to avoid polluting your query string.

1. EDisMax Parameter Overviews

1.1 qf (Query Fields)

  • Purpose: Defines which fields EDisMax searches and at what boost weights.
  • Why It Matters: Separates “What the user typed” (q) from Solr-specific logic (boosts). Keep your q as pure user intent.

Syntax:

qf=field1^boost1 field2^boost2 ...
  • field1, field2: field names defined in schema.xml.
  • boost (float): relative weight for that field.

Example:

qf=title^2.0 description^1.0 content^0.5

1.2 mm (Minimum “Should” Match)

  • Purpose: Controls how many terms in a multi-term query must match in the targeted fields.
  • Why It Matters: Balances recall vs. precision. Too strict (mm=100%) → few/no results; too loose (mm=0%) → noisy results.

Syntax:

mm=<value>
  • Absolute: mm=2 (at least 2 terms must match)
  • Relative: mm=75% (75% of terms must match)
  • Combined: Supports “N of M” logic: mm=2<90% 5<100%

Example:

mm=2<75% 4<90% 6<100%

Interpretation: For queries up to 2 terms, require 75% (i.e., both terms). For up to 4, require 75%. For up to 6, require 90%. For more than 6, require 100%.


1.3 pf, pf1, pf2 (Phrase Fields)

EDisMax can detect phrase matches (adjacent tokens) and boost documents accordingly.

  • pf

    • Purpose: Apply a boost when the entire (multi-term) query appears as a contiguous phrase in the field.
    • Syntax: pf=fieldA^boostA fieldB^boostB ...
    • Usage: Best for multi-word queries (3+ terms). Combined with ps.
  • pf2

    • Purpose: Boost when any two-term phrase (bigram) from the query matches contiguously.
    • Syntax: pf2=fieldA^boostA fieldB^boostB ...
    • Usage: Useful to capture important two-word phrases even if the full phrase fails.
  • pf1

    • Purpose: Boost when a single-term query appears (helpful for one-word queries to maintain consistency).
    • Syntax: pf1=fieldA^boostA fieldB^boostB ...
    • Usage: For single-word queries; can mimic qf behavior but reserved for phrase logic.

1.4 ps, ps1, ps2 (Phrase Slops)

Phrase slop determines how far apart terms can be and still count as a phrase.

  • ps

    • Slop for pf fields.
    • Syntax: ps=<integer>
    • Example: ps=2 allows up to two token moves (e.g., “quick fox” matching “quick brown fox”).
  • ps2

    • Slop for pf2 (two-term phrases).
    • Syntax: ps2=<integer>
    • Example: ps2=1 — if query is “solar power”, it will still match “power of solar” if only one word moves.
  • ps1

    • Slop for pf1 (single-term).
    • Syntax: ps1=<integer> (though slop rarely matters for single-term).
    • Example: Use ps1=0 (exact match) or omit.

2. Practical schema.xml Definitions

Below is an example schema.xml snippet illustrating how to set up fields commonly used with EDisMax. In this fictional “ClassicBook” index, we have fields for title, author, summary, and content.

<!-- schema.xml excerpt for EDisMax example -->
<schema name="classicbook" version="1.6">
  <!-- Field Types -->
  <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <charFilter class="solr.HTMLStripCharFilterFactory"/>
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.ICUTokenizerFactory"/>
    <filter class="solr.CJKWidthFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" protected="protwords.txt" splitOnCaseChange="1" generateWordParts="1" preserveOriginal="1" catenateAll="1" catenateWords="1"/>
    <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
    <filter class="solr.SnowballPorterFilterFactory" protected="protwords.txt" language="English"/>
    <filter class="solr.LengthFilterFactory" min="1" max="500"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <charFilter class="solr.HTMLStripCharFilterFactory"/>
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.ICUTokenizerFactory"/>
    <filter class="solr.CJKWidthFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SynonymGraphFilterFactory" ignoreCase="true" synonyms="synonyms.txt" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" protected="protwords.txt" splitOnCaseChange="1" generateWordParts="1" preserveOriginal="1" catenateAll="1" catenateWords="1"/>
    <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
    <filter class="solr.SnowballPorterFilterFactory" protected="protwords.txt" language="English"/>
    <filter class="solr.LengthFilterFactory" min="1" max="500"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  </fieldType>

  <!-- Fields -->
  <field name="id" type="string" indexed="true" stored="true" required="true"/>
  <field name="title" type="text_general" indexed="true" stored="true" multiValued="false"/>
  <field name="author" type="string" indexed="true" stored="true"/>

  <field name="summary" type="text_general" indexed="true" stored="true" multiValued="false"/>
  <field name="content" type="text_general" indexed="true" stored="false" multiValued="false"/>

  <!-- Copy Field: Aggregate searchable text -->
  <field name="text_all" type="text_general" indexed="true" stored="false" multiValued="false"/>
  <copyField source="title"      dest="text_all"/>
  <copyField source="author"     dest="text_all"/>
  <copyField source="summary"    dest="text_all"/>
  <copyField source="content"    dest="text_all"/>

  <!-- Default Search Field -->
  <defaultSearchField>text_all</defaultSearchField>

  <!-- Unique Key -->
  <uniqueKey>id</uniqueKey>
</schema>

Notes on Fields:

  • text_all: Combined field with all searchable content. Use text_all in qf, pf, etc.
  • Field-Specific Boosting: We will boost title more than content because titles historically matter more to users.

Opensolr Tip: On Opensolr, you edit schema.xml and solrconfig.xml directly from the Control Panel — no SSH, no file system access. Upload your config, reload your index, and your EDisMax defaults are live.


3. Keeping q Clean with qf

A traditional Solr query might look like:

http://localhost:8983/solr/classicbook/select?q=title:"solar power"^2 summary:"solar power"^1

But that pollutes q with boost logic. Instead, use:

  • q: raw user text, e.g.: solar power
  • qf: field boosts
  • bq, bf: additional boosts (optionally)

Example (clean q):

q=solar power
&defType=edismax
&qf=title^3.0 summary^1.5 text_all^0.5
  • Explanation:
    • title matches count triple weight.
    • summary double weight.
    • text_all keeps the engine honest but low weight.

Your query string remains user-centric.


4. EDisMax in Action: Query Examples

Below are several search examples illustrating how EDisMax parameters affect results.

4.1 Basic Keyword Search with qf and mm

Request:

GET /solr/classicbook/select?
  q=ancient philosophy
  &defType=edismax
  &qf=title^2 summary^1 content^0.2
  &mm=75%
  &hl=true

What Happens:

  1. Tokenization: “ancient” and “philosophy”
  2. mm=75%:
    • For 2 terms, 75% rounds up → both terms must match (since 75% of 2 = 1.5 → 2 terms).
  3. Field Scoring:
    • Matches in title count double compared to summary.
    • Matches in content count minimal.

If a document has “ancient” in title but not “philosophy,” it is excluded (because both must match). Solr returns documents where both words appear, boosting those with title matches.


4.2 Phrase Boosting with pf and ps

Request:

GET /solr/classicbook/select?
  q=ancient philosophy treatise
  &defType=edismax
  &qf=title^2 summary^1 content^0.2
  &mm=2<75% 3<90% 4<100%
  &pf=title^5 summary^3
  &ps=2

What Happens:

  1. Term Matching:

    • mm=2<75%: For 3 terms, need at least 75% → 3 terms * 0.75 = 2.25 → 3 terms.
    • For 4 terms, need 90%.
    • Effect: Strict multi-term matching.
  2. Phrase Boost (pf):

    • If “ancient philosophy treatise” appears [with up to 2-word gaps (ps=2)] in title or summary, a significant boost is applied.
    • E.g., “ancient philosophical treatise” (one word in between) still qualifies.
  3. Scoring Order:

    • Exact phrase in title > phrase in summary > term matches alone.

4.3 Two-Term Phrase with pf2 and ps2

Suppose we want to capture strong two-word phrases:

GET /solr/classicbook/select?
  q=quantum mechanics equations
  &defType=edismax
  &qf=title^2 summary^1 content^0.2
  &mm=2<75% 3<90% 4<100%
  &pf2=title^4 summary^2
  &ps2=1

What Happens:

  • For any two-term phrase from “quantum mechanics equations” (e.g., “quantum mechanics”, “mechanics equations”):
    • If “quantum mechanics” appears with slop ≤ 1 in title, boost by 4.
    • In summary, boost by 2.

Thus, even if the full three-term phrase isn’t present, two-term pairs can surface important context.


4.4 One-Term Phrase with pf1 and ps1

For completeness:

GET /solr/classicbook/select?
  q=philosophy
  &defType=edismax
  &qf=title^2 summary^1 content^0.2
  &mm=1
  &pf1=title^3 summary^1
  &ps1=0
  • Single-term query “philosophy.”
  • pf1 boosts documents where “philosophy” appears in title (×3) or summary (×1).
  • ps1=0 means exact match; no slop needed.

5. Deep Dive into Parameters

5.1 qf: The Heart of Clean Queries

<!-- In solrconfig.xml (RequestHandler) -->
<requestHandler name="/select" class="solr.SearchHandler" default="true">
  <lst name="defaults">
    <str name="defType">edismax</str>
    <str name="qf">
      title^3.0
      summary^1.5
      text_all^0.5
    </str>
    <str name="mm">2&lt;75% 4&lt;90% 6&lt;100%</str>
    <str name="pf">title^4 summary^2</str>
    <str name="ps">2</str>
    <str name="pf2">title^3 summary^1</str>
    <str name="ps2">1</str>
    <str name="pf1">title^2 summary^1</str>
    <str name="ps1">0</str>
  </lst>
</requestHandler>
  • Why Defaults?
    Embedding your qf and mm in defaults ensures consistency across all calls. You can override at query time if needed.

  • Traditional Tip:
    “Once you set sensible default boosts, you save yourself countless hours of tweaking individual requests.”

Beyond Config Files: Opensolr's Search Tuning gives you visual sliders for field weights, minimum match, freshness boost, and the balance between keyword and semantic scoring — no XML editing required. Changes take effect on the next query, no reload needed.

5.2 mm: Balancing Recall vs. Precision

  • Absolute vs. Percentage:

    • Use absolute (mm=2) for short queries (2–3 words).
    • Use relative (mm=50%) for longer queries.
  • Composite Syntax:

    • mm=2&lt;75% 4&lt;90% 6&lt;100%:
      • Up to 2 terms → 75% → 2 terms must match.
      • 3–4 terms → at least 3 matches.
      • 5–6 terms → at least 5 matches.
      • 7+ terms → 100% matches (strict).

Pro Tip (Traditional Wisdom): If you haven’t set mm and rely on default fuzzy matching, you might end up with the dreaded “too many results” syndrome.


5.3 pf, pf1, pf2: Phrase Boosting Strategies

  • Why Phrase Boost?
    Users often type queries that imply an exact phrase (“sherlock holmes stories”). Reward documents that honor the phrase. It’s like giving a tip to the bartender for a well-made drink—acknowledge exactly what was asked.

  • pf (General Phrase Boost):

    • Targets the entire query sequence.
    • Combined with ps to allow “wiggle room” (slop).
  • pf2 (Bigram Boost):

    • Useful when the full phrase fails or user typed a longer phrase.
    • Captures strong two-word signals.
  • pf1 (Unigram Boost):

    • Useful for consistency in single-word queries.
    • Ensures short queries also benefit from phrase logic.
  • Fine-Tuning Slop (ps, ps2, ps1):

    • ps=2: Up to 2-term gaps in phrase.
    • ps2=1: Up to 1-term gap in bigrams.
    • ps1=0: No gap for single-term (exact).

5.4 Putting It All Together: Sample solrconfig.xml

<config>
  <requestHandler name="/select" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="defType">edismax</str>

      <!-- Core Query Fields -->
      <str name="qf">
        title^3.0
        summary^1.5
        text_all^0.5
      </str>

      <!-- Minimum “Should” Match -->
      <str name="mm">2&lt;75% 4&lt;90% 6&lt;100%</str>

      <!-- Phrase Boosts -->
      <str name="pf">title^5 summary^3</str>
      <str name="ps">2</str>
      <str name="pf2">title^4 summary^2</str>
      <str name="ps2">1</str>
      <str name="pf1">title^3 summary^1</str>
      <str name="ps1">0</str>

      <!-- Highlighting Defaults -->
      <str name="hl">true</str>
      <str name="hl.fl">title,summary,content</str>
    </lst>
  </requestHandler>
</config>

Explanation:

  1. qf: Splits search across fields, boosting title most.
  2. mm: Balances how many terms must match.
  3. pf: Big phrase boost to reward exact (or near-exact) matches.
  4. pf2: Two-word phrase boost, capturing key bigrams.
  5. pf1: Single-term phrase boost, ensuring one-word queries still get a leg up.
  6. ps, ps2, ps1: Slop controls—allowing some wiggle but not too much.

6. Advanced Examples & Scenarios

6.1 Long Queries with pf2 and pf

Scenario:
A user queries: “quantum field theory experiments at low temperatures”.

GET /solr/classicbook/select?
  q=quantum field theory experiments at low temperatures
  &defType=edismax
  &qf=title^2 summary^1 text_all^0.3
  &mm=4&lt;75% 7&lt;90% 10&lt;100%
  &pf=title^5 summary^3
  &ps=2
  &pf2=title^4 summary^2
  &ps2=1
  • Term Count: 7 terms.
  • mm: of 7 terms, require 90% → 7 × 0.9 = 6.3 → 7 terms. (Strict)
  • Phrase Logic:
    • If “quantum field theory experiments” appears (with ≤ 2-term slop) in title, massive boost.
    • Even if that fails, any two-word pairs like “quantum field” or “field theory” get a smaller boost.

Vintage Wisdom: Back in the day, we hand-crafted this logic with multiple q clauses. Now, EDisMax elegantly bundles it.


6.2 Query-Time Overrides

Though defaults serve most cases, you can override parameters on the fly:

GET /solr/classicbook/select?
  q=renaissance art paintings
  &defType=edismax
  &qf=title^4 summary^2 text_all^0.4
  &pf=title^6 summary^3
  &ps=3
  &mm=2&lt;80% 5&lt;90% 8&lt;100%
  &rows=20
  &sort=score desc, publish_date desc
  • Query-Specific Boosts: Bump title further for “art”-centric queries.
  • Phrase Slop: Increased to 3 to allow permutations like “art of renaissance paintings”.
  • Sorting: Traditional “score then date” ordering.

7. Best Practices and Traditional Tips

  1. Set Sensible Defaults:

    • Tune qf and mm in your solrconfig.xml defaults. Overrides are the exception, not the rule.
  2. Phrase Boosting Wisdom:

    • Start with pf + ps=2. See if that yields decent results. Add pf2 if long queries are common.
    • Resist the urge to crank ps too high—phrase logic gets too loose.
  3. Keep q Pure:

    • Always let q be user-provided text. Place boosting/logic in separate parameters like qf, bq, bf.
  4. Iterate with Analytics:

    • Review Solr’s query logs. Are users often getting no results? Adjust mm.
    • Are phrase matches failing? Tweak ps or add pf2.
    • Opensolr's Click Analytics tracks which results users actually click — high impressions with low CTR tells you exactly which queries need tuning.
    • Error Audit catches malformed queries and parsing errors automatically, so you spot problems before users complain.
  5. Legacy Reminder:

    • If you remember basic Lucene queries (+title:(ancient philosophy) +summary:(ancient philosophy)), EDisMax is your elegant shortcut. Honor the old ways by acknowledging them, but embrace convenience.

8. Summary

EDisMax streamlines complex boolean and phrase-scoring logic into a cohesive, user-friendly API. By mastering:

  • qf: Keep your query field mappings clear and boost weights sensible.
  • mm: Balance recall and precision with thoughtful minimum-match.
  • pf, pf1, pf2, ps, ps1, ps2: Elevate phrase relevance without over-engineering.

…you retain the “traditional Solr craftsmanship” while leveraging modern conveniences. May your search results be ever relevant, your phrase boosts well-balanced, and your schema forever flexible.



9. Opensolr: EDisMax Without the Infrastructure

Everything in this guide — schema.xml field types, solrconfig.xml request handlers, EDisMax parameter tuning — is what you normally set up on your own Solr servers. Opensolr handles the infrastructure so you can focus on search quality instead of server management.

Here is what that looks like in practice:

  • Managed Solr Hosting — Production-ready Solr indexes with resilient clusters (master + replicas), automatic failover, and global data centers. No JVM tuning, no ZooKeeper, no disk monitoring. You get a Solr endpoint and start indexing. (How Opensolr Works)

  • Browser-Based Config Editing — The schema.xml and solrconfig.xml examples in this guide? On Opensolr, you edit them directly from the Control Panel. Upload, reload, done. No SSH. (Platform Guide)

  • Search Tuning — No XML Needed — Visual sliders to control field weights, minimum match, freshness boost, and keyword vs semantic balance. Every EDisMax parameter discussed in Sections 1–5 above can be tuned visually, with changes taking effect on the next query. (Search Tuning)

  • Query Elevation — When EDisMax ranking is not enough for specific queries, pin documents to the top or exclude irrelevant results — per query, instantly, no reindexing. (Query Elevation)

  • Click Analytics & CTR — Section 7 says "iterate with analytics." Opensolr tracks what users search and what they click. High impressions with low CTR means that query needs mm or pf tuning. (Click Analytics)

  • Error Audit — Every Solr error, warning, and stack trace from the last 7 days, parsed and searchable from the Control Panel. When a query fails because of bad syntax or a missing field, you know immediately — not from a user complaint. (Error Audit)

  • Hybrid Search — Combine the keyword-based EDisMax scoring from this guide with vector/semantic search using 1024-dim BGE-m3 embeddings. Same index, same query — keyword precision plus semantic understanding. (Hybrid Search)

  • Web Crawler — Need content to search? Point the crawler at a URL. It handles crawling, text extraction (HTML, PDF, DOCX), AI enrichment, vector embeddings, and indexing. A production search engine without writing a single indexing pipeline. (Web Crawler)

  • Data Ingestion API — For content the crawler cannot reach (databases, internal systems, product feeds), push documents via REST API with automatic vector embedding, sentiment analysis, and language detection built in. (Data Ingestion API)

The EDisMax knowledge in this guide applies directly on Opensolr — same parameters, same behavior, same Solr underneath. The difference is you skip the infrastructure work and get tools that make tuning visual and measurable.


Written with respect for the traditions of Solr and a wink to the days of manual Lucene queries. Enjoy!