Combining the best of DisMax and Extended DisMax with a dash of humor and a nod to traditional Solr wisdom.
Solr’s EDisMax (Extended DisMax) query parser is the workhorse for modern search applications. It builds upon the classic DisMax parser, providing more flexibility, advanced features, and sharper control over scoring. In the days of yore, we manually crafted complex Solr queries with multiple TF-IDF tweaks; today, EDisMax handles much of that complexity, letting you focus on practical relevance tuning (and perhaps nostalgically remember those heady days of manual schema edits).
This guide dives deep into EDisMax’s most important parameters:
qfmmpf, pf1, pf2ps, ps1, ps2We’ll cover:
schema.xml configurations.q Clean: Using qf to avoid polluting your query string.qf (Query Fields)q) from Solr-specific logic (boosts). Keep your q as pure user intent.Syntax:
qf=field1^boost1 field2^boost2 ...
field1, field2: field names defined in schema.xml.boost (float): relative weight for that field.Example:
qf=title^2.0 description^1.0 content^0.5
mm (Minimum “Should” Match)mm=100%) → few/no results; too loose (mm=0%) → noisy results.Syntax:
mm=<value>
mm=2 (at least 2 terms must match)mm=75% (75% of terms must match)mm=2<90% 5<100%Example:
mm=2<75% 4<90% 6<100%
Interpretation: For queries up to 2 terms, require 75% (i.e., both terms). For up to 4, require 75%. For up to 6, require 90%. For more than 6, require 100%.
pf, pf1, pf2 (Phrase Fields)EDisMax can detect phrase matches (adjacent tokens) and boost documents accordingly.
pf
pf=fieldA^boostA fieldB^boostB ...ps.pf2
pf2=fieldA^boostA fieldB^boostB ...pf1
pf1=fieldA^boostA fieldB^boostB ...qf behavior but reserved for phrase logic.ps, ps1, ps2 (Phrase Slops)Phrase slop determines how far apart terms can be and still count as a phrase.
ps
pf fields.ps=<integer>ps=2 allows up to two token moves (e.g., “quick fox” matching “quick brown fox”).ps2
pf2 (two-term phrases).ps2=<integer>ps2=1 — if query is “solar power”, it will still match “power of solar” if only one word moves.ps1
pf1 (single-term).ps1=<integer> (though slop rarely matters for single-term).ps1=0 (exact match) or omit.schema.xml DefinitionsBelow is an example schema.xml snippet illustrating how to set up fields commonly used with EDisMax. In this fictional “ClassicBook” index, we have fields for title, author, summary, and content.
<!-- schema.xml excerpt for EDisMax example --> <schema name="classicbook" version="1.6"> <!-- Field Types --> <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> <!-- Fields --> <field name="id" type="string" indexed="true" stored="true" required="true"/> <field name="title" type="text_general" indexed="true" stored="true" multiValued="false"/> <field name="author" type="string" indexed="true" stored="true"/> <field name="summary" type="text_general" indexed="true" stored="true" multiValued="false"/> <field name="content" type="text_general" indexed="true" stored="false" multiValued="false"/> <!-- Copy Field: Aggregate searchable text --> <field name="text_all" type="text_general" indexed="true" stored="false" multiValued="false"/> <copyField source="title" dest="text_all"/> <copyField source="author" dest="text_all"/> <copyField source="summary" dest="text_all"/> <copyField source="content" dest="text_all"/> <!-- Default Search Field --> <defaultSearchField>text_all</defaultSearchField> <!-- Unique Key --> <uniqueKey>id</uniqueKey> </schema>
Notes on Fields:
text_all: Combined field with all searchable content. Use text_all in qf, pf, etc.title more than content because titles historically matter more to users.q Clean with qfA traditional Solr query might look like:
http://localhost:8983/solr/classicbook/select?q=title:"solar power"^2 summary:"solar power"^1
But that pollutes q with boost logic. Instead, use:
q: raw user text, e.g.: solar powerqf: field boostsbq, bf: additional boosts (optionally)Example (clean q):
q=solar power &defType=edismax &qf=title^3.0 summary^1.5 text_all^0.5
title matches count triple weight.summary double weight.text_all keeps the engine honest but low weight.Your query string remains user-centric.
Below are several search examples illustrating how EDisMax parameters affect results.
qf and mmRequest:
GET /solr/classicbook/select? q=ancient philosophy &defType=edismax &qf=title^2 summary^1 content^0.2 &mm=75% &hl=true
What Happens:
mm=75%:
title count double compared to summary.content count minimal.If a document has “ancient” in title but not “philosophy,” it is excluded (because both must match). Solr returns documents where both words appear, boosting those with title matches.
pf and psRequest:
GET /solr/classicbook/select? q=ancient philosophy treatise &defType=edismax &qf=title^2 summary^1 content^0.2 &mm=2<75% 3<90% 4<100% &pf=title^5 summary^3 &ps=2
What Happens:
Term Matching:
mm=2<75%: For 3 terms, need at least 75% → 3 terms * 0.75 = 2.25 → 3 terms.Phrase Boost (pf):
ps=2)] in title or summary, a significant boost is applied.Scoring Order:
title > phrase in summary > term matches alone.pf2 and ps2Suppose we want to capture strong two-word phrases:
GET /solr/classicbook/select? q=quantum mechanics equations &defType=edismax &qf=title^2 summary^1 content^0.2 &mm=2<75% 3<90% 4<100% &pf2=title^4 summary^2 &ps2=1
What Happens:
title, boost by 4.summary, boost by 2.Thus, even if the full three-term phrase isn’t present, two-term pairs can surface important context.
pf1 and ps1For completeness:
GET /solr/classicbook/select? q=philosophy &defType=edismax &qf=title^2 summary^1 content^0.2 &mm=1 &pf1=title^3 summary^1 &ps1=0
pf1 boosts documents where “philosophy” appears in title (×3) or summary (×1).ps1=0 means exact match; no slop needed.qf: The Heart of Clean Queries<!-- In solrconfig.xml (RequestHandler) --> <requestHandler name="/select" class="solr.SearchHandler" default="true"> <lst name="defaults"> <str name="defType">edismax</str> <str name="qf"> title^3.0 summary^1.5 text_all^0.5 </str> <str name="mm">2<75% 4<90% 6<100%</str> <str name="pf">title^4 summary^2</str> <str name="ps">2</str> <str name="pf2">title^3 summary^1</str> <str name="ps2">1</str> <str name="pf1">title^2 summary^1</str> <str name="ps1">0</str> </lst> </requestHandler>
Why Defaults?
Embedding your qf and mm in defaults ensures consistency across all calls. You can override at query time if needed.
Traditional Tip:
“Once you set sensible default boosts, you save yourself countless hours of tweaking individual requests.”
mm: Balancing Recall vs. PrecisionAbsolute vs. Percentage:
mm=2) for short queries (2–3 words).mm=50%) for longer queries.Composite Syntax:
mm=2<75% 4<90% 6<100%:
Pro Tip (Traditional Wisdom): If you haven’t set
mmand rely on default fuzzy matching, you might end up with the dreaded “too many results” syndrome.
pf, pf1, pf2: Phrase Boosting StrategiesWhy Phrase Boost?
Users often type queries that imply an exact phrase (“sherlock holmes stories”). Reward documents that honor the phrase. It’s like giving a tip to the bartender for a well-made drink—acknowledge exactly what was asked.
pf (General Phrase Boost):
ps to allow “wiggle room” (slop).pf2 (Bigram Boost):
pf1 (Unigram Boost):
Fine-Tuning Slop (ps, ps2, ps1):
ps=2: Up to 2-term gaps in phrase.ps2=1: Up to 1-term gap in bigrams.ps1=0: No gap for single-term (exact).solrconfig.xml<config> <requestHandler name="/select" class="solr.SearchHandler"> <lst name="defaults"> <str name="defType">edismax</str> <!-- Core Query Fields --> <str name="qf"> title^3.0 summary^1.5 text_all^0.5 </str> <!-- Minimum “Should” Match --> <str name="mm">2<75% 4<90% 6<100%</str> <!-- Phrase Boosts --> <str name="pf">title^5 summary^3</str> <str name="ps">2</str> <str name="pf2">title^4 summary^2</str> <str name="ps2">1</str> <str name="pf1">title^3 summary^1</str> <str name="ps1">0</str> <!-- Highlighting Defaults --> <str name="hl">true</str> <str name="hl.fl">title,summary,content</str> </lst> </requestHandler> </config>
Explanation:
qf: Splits search across fields, boosting title most.mm: Balances how many terms must match.pf: Big phrase boost to reward exact (or near-exact) matches.pf2: Two-word phrase boost, capturing key bigrams.pf1: Single-term phrase boost, ensuring one-word queries still get a leg up.ps, ps2, ps1: Slop controls—allowing some wiggle but not too much.pf2 and pfScenario:
A user queries: “quantum field theory experiments at low temperatures”.
GET /solr/classicbook/select? q=quantum field theory experiments at low temperatures &defType=edismax &qf=title^2 summary^1 text_all^0.3 &mm=4<75% 7<90% 10<100% &pf=title^5 summary^3 &ps=2 &pf2=title^4 summary^2 &ps2=1
mm: of 7 terms, require 90% → 7 × 0.9 = 6.3 → 7 terms. (Strict)title, massive boost.Vintage Wisdom: Back in the day, we hand-crafted this logic with multiple
qclauses. Now, EDisMax elegantly bundles it.
Though defaults serve most cases, you can override parameters on the fly:
GET /solr/classicbook/select? q=renaissance art paintings &defType=edismax &qf=title^4 summary^2 text_all^0.4 &pf=title^6 summary^3 &ps=3 &mm=2<80% 5<90% 8<100% &rows=20 &sort=score desc, publish_date desc
title further for “art”-centric queries.3 to allow permutations like “art of renaissance paintings”.Set Sensible Defaults:
qf and mm in your solrconfig.xml defaults. Overrides are the exception, not the rule.Phrase Boosting Wisdom:
pf + ps=2. See if that yields decent results. Add pf2 if long queries are common.ps too high—phrase logic gets too loose.Keep q Pure:
q be user-provided text. Place boosting/logic in separate parameters like qf, bq, bf.Iterate with Analytics:
mm.ps or add pf2.Legacy Reminder:
+title:(ancient philosophy) +summary:(ancient philosophy)), EDisMax is your elegant shortcut. Honor the old ways by acknowledging them, but embrace convenience.EDisMax streamlines complex boolean and phrase-scoring logic into a cohesive, user-friendly API. By mastering:
qf: Keep your query field mappings clear and boost weights sensible.mm: Balance recall and precision with thoughtful minimum-match.pf, pf1, pf2, ps, ps1, ps2: Elevate phrase relevance without over-engineering.…you retain the “traditional Solr craftsmanship” while leveraging modern conveniences. May your search results be ever relevant, your phrase boosts well-balanced, and your schema forever flexible.
Written with respect for the traditions of Solr and a wink to the days of manual Lucene queries. Enjoy!