Documentation

Select a category on the left, to get your answers quickly

AI-RAG-NLP-API API-Backup Managem... API-Cluster Manage... API-Config Files API-Index Manageme... API-Logs & Analyti... API-Security API-Web Crawler Billing CMS-Drupal CMS-Moodle CMS-Omeka CMS-Wordpress Data Backup Errors in Solr Opensolr Configura... Security Solr Manager Contr... Solr Tika Integrat... Teams Management Web Crawler Wiki

The Solr edismax query Parser. How it works and how it helps.

Solr EDisMax Query Parser Guide

Combining the best of DisMax and Extended DisMax with a dash of humor and a nod to traditional Solr wisdom.

Introduction

Solr’s EDisMax (Extended DisMax) query parser is the workhorse for modern search applications. It builds upon the classic DisMax parser, providing more flexibility, advanced features, and sharper control over scoring. In the days of yore, we manually crafted complex Solr queries with multiple TF-IDF tweaks; today, EDisMax handles much of that complexity, letting you focus on practical relevance tuning (and perhaps nostalgically remember those heady days of manual schema edits).

This guide dives deep into EDisMax’s most important parameters:

qf
mm
pf, pf1, pf2
ps, ps1, ps2

We’ll cover:

Parameter Overviews: What each parameter does.
Practical Schema Definitions: Example schema.xml configurations.
Query Examples: How to call EDisMax in practice.
Detailed Explanations: Behind-the-scenes of scoring and slop.
Keeping q Clean: Using qf to avoid polluting your query string.

1. EDisMax Parameter Overviews

1.1 `qf` (Query Fields)

Purpose: Defines which fields EDisMax searches and at what boost weights.
Why It Matters: Separates “What the user typed” (q) from Solr-specific logic (boosts). Keep your q as pure user intent.

Syntax:

qf=field1^boost1 field2^boost2 ...

field1, field2: field names defined in schema.xml.
boost (float): relative weight for that field.

Example:

qf=title^2.0 description^1.0 content^0.5

1.2 `mm` (Minimum “Should” Match)

Purpose: Controls how many terms in a multi-term query must match in the targeted fields.
Why It Matters: Balances recall vs. precision. Too strict (mm=100%) → few/no results; too loose (mm=0%) → noisy results.

Syntax:

mm=<value>

Absolute: mm=2 (at least 2 terms must match)
Relative: mm=75% (75% of terms must match)
Combined: Supports “N of M” logic: mm=2<90% 5<100%

Example:

mm=2<75% 4<90% 6<100%

Interpretation: For queries up to 2 terms, require 75% (i.e., both terms). For up to 4, require 75%. For up to 6, require 90%. For more than 6, require 100%.

1.3 `pf`, `pf1`, `pf2` (Phrase Fields)

EDisMax can detect phrase matches (adjacent tokens) and boost documents accordingly.

pf
Purpose: Apply a boost when the entire (multi-term) query appears as a contiguous phrase in the field.
Syntax: pf=fieldA^boostA fieldB^boostB ...
Usage: Best for multi-word queries (3+ terms). Combined with ps.
pf2
Purpose: Boost when any two-term phrase (bigram) from the query matches contiguously.
Syntax: pf2=fieldA^boostA fieldB^boostB ...
Usage: Useful to capture important two-word phrases even if the full phrase fails.
pf1
Purpose: Boost when a single-term query appears (helpful for one-word queries to maintain consistency).
Syntax: pf1=fieldA^boostA fieldB^boostB ...
Usage: For single-word queries; can mimic qf behavior but reserved for phrase logic.

1.4 `ps`, `ps1`, `ps2` (Phrase Slops)

Phrase slop determines how far apart terms can be and still count as a phrase.

ps
Slop for pf fields.
Syntax: ps=<integer>
Example: ps=2 allows up to two token moves (e.g., “quick fox” matching “quick brown fox”).
ps2
Slop for pf2 (two-term phrases).
Syntax: ps2=<integer>
Example: ps2=1 — if query is “solar power”, it will still match “power of solar” if only one word moves.
ps1
Slop for pf1 (single-term).
Syntax: ps1=<integer> (though slop rarely matters for single-term).
Example: Use ps1=0 (exact match) or omit.

2. Practical `schema.xml` Definitions

Below is an example schema.xml snippet illustrating how to set up fields commonly used with EDisMax. In this fictional “ClassicBook” index, we have fields for title, author, summary, and content.

<!-- schema.xml excerpt for EDisMax example -->
<schema name="classicbook" version="1.6">
  <!-- Field Types -->
  <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
  </fieldType>

  <!-- Fields -->
  <field name="id" type="string" indexed="true" stored="true" required="true"/>
  <field name="title" type="text_general" indexed="true" stored="true" multiValued="false"/>
  <field name="author" type="string" indexed="true" stored="true"/>

  <field name="summary" type="text_general" indexed="true" stored="true" multiValued="false"/>
  <field name="content" type="text_general" indexed="true" stored="false" multiValued="false"/>

  <!-- Copy Field: Aggregate searchable text -->
  <field name="text_all" type="text_general" indexed="true" stored="false" multiValued="false"/>
  <copyField source="title"      dest="text_all"/>
  <copyField source="author"     dest="text_all"/>
  <copyField source="summary"    dest="text_all"/>
  <copyField source="content"    dest="text_all"/>

  <!-- Default Search Field -->
  <defaultSearchField>text_all</defaultSearchField>

  <!-- Unique Key -->
  <uniqueKey>id</uniqueKey>
</schema>

Notes on Fields:

text_all: Combined field with all searchable content. Use text_all in qf, pf, etc.
Field-Specific Boosting: We will boost title more than content because titles historically matter more to users.

3. Keeping `q` Clean with `qf`

A traditional Solr query might look like:

http://localhost:8983/solr/classicbook/select?q=title:"solar power"^2 summary:"solar power"^1

But that pollutes q with boost logic. Instead, use:

q: raw user text, e.g.: solar power
qf: field boosts
bq, bf: additional boosts (optionally)

Example (clean q):

q=solar power
&defType=edismax
&qf=title^3.0 summary^1.5 text_all^0.5

Explanation:
title matches count triple weight.
summary double weight.
text_all keeps the engine honest but low weight.

Your query string remains user-centric.

4. EDisMax in Action: Query Examples

Below are several search examples illustrating how EDisMax parameters affect results.

4.1 Basic Keyword Search with `qf` and `mm`

Request:

GET /solr/classicbook/select?
  q=ancient philosophy
  &defType=edismax
  &qf=title^2 summary^1 content^0.2
  &mm=75%
  &hl=true

What Happens:

Tokenization: “ancient” and “philosophy”
mm=75%:
For 2 terms, 75% rounds up → both terms must match (since 75% of 2 = 1.5 → 2 terms).
Field Scoring:
Matches in title count double compared to summary.
Matches in content count minimal.

If a document has “ancient” in title but not “philosophy,” it is excluded (because both must match). Solr returns documents where both words appear, boosting those with title matches.

4.2 Phrase Boosting with `pf` and `ps`

Request:

GET /solr/classicbook/select?
  q=ancient philosophy treatise
  &defType=edismax
  &qf=title^2 summary^1 content^0.2
  &mm=2<75% 3<90% 4<100%
  &pf=title^5 summary^3
  &ps=2

What Happens:

Term Matching:
mm=2<75%: For 3 terms, need at least 75% → 3 terms * 0.75 = 2.25 → 3 terms.
For 4 terms, need 90%.
Effect: Strict multi-term matching.
Phrase Boost (pf):
If “ancient philosophy treatise” appears [with up to 2-word gaps (ps=2)] in title or summary, a significant boost is applied.
E.g., “ancient philosophical treatise” (one word in between) still qualifies.
Scoring Order:
Exact phrase in title > phrase in summary > term matches alone.

4.3 Two-Term Phrase with `pf2` and `ps2`

Suppose we want to capture strong two-word phrases:

GET /solr/classicbook/select?
  q=quantum mechanics equations
  &defType=edismax
  &qf=title^2 summary^1 content^0.2
  &mm=2<75% 3<90% 4<100%
  &pf2=title^4 summary^2
  &ps2=1

What Happens:

For any two-term phrase from “quantum mechanics equations” (e.g., “quantum mechanics”, “mechanics equations”):
If “quantum mechanics” appears with slop ≤ 1 in title, boost by 4.
In summary, boost by 2.

Thus, even if the full three-term phrase isn’t present, two-term pairs can surface important context.

4.4 One-Term Phrase with `pf1` and `ps1`

For completeness:

GET /solr/classicbook/select?
  q=philosophy
  &defType=edismax
  &qf=title^2 summary^1 content^0.2
  &mm=1
  &pf1=title^3 summary^1
  &ps1=0

Single-term query “philosophy.”
pf1 boosts documents where “philosophy” appears in title (×3) or summary (×1).
ps1=0 means exact match; no slop needed.

5. Deep Dive into Parameters

5.1 `qf`: The Heart of Clean Queries

<!-- In solrconfig.xml (RequestHandler) -->
<requestHandler name="/select" class="solr.SearchHandler" default="true">
  <lst name="defaults">
    <str name="defType">edismax</str>
    <str name="qf">
      title^3.0
      summary^1.5
      text_all^0.5
    </str>
    <str name="mm">2&lt;75% 4&lt;90% 6&lt;100%</str>
    <str name="pf">title^4 summary^2</str>
    <str name="ps">2</str>
    <str name="pf2">title^3 summary^1</str>
    <str name="ps2">1</str>
    <str name="pf1">title^2 summary^1</str>
    <str name="ps1">0</str>
  </lst>
</requestHandler>

Why Defaults?
Embedding your qf and mm in defaults ensures consistency across all calls. You can override at query time if needed.
Traditional Tip:
“Once you set sensible default boosts, you save yourself countless hours of tweaking individual requests.”

5.2 `mm`: Balancing Recall vs. Precision

Absolute vs. Percentage:
Use absolute (mm=2) for short queries (2–3 words).
Use relative (mm=50%) for longer queries.
Composite Syntax:
mm=2<75% 4<90% 6<100%:
- Up to 2 terms → 75% → 2 terms must match.
- 3–4 terms → at least 3 matches.
- 5–6 terms → at least 5 matches.
- 7+ terms → 100% matches (strict).

Pro Tip (Traditional Wisdom): If you haven’t set mm and rely on default fuzzy matching, you might end up with the dreaded “too many results” syndrome.

5.3 `pf`, `pf1`, `pf2`: Phrase Boosting Strategies

Why Phrase Boost?
Users often type queries that imply an exact phrase (“sherlock holmes stories”). Reward documents that honor the phrase. It’s like giving a tip to the bartender for a well-made drink—acknowledge exactly what was asked.
pf (General Phrase Boost):
Targets the entire query sequence.
Combined with ps to allow “wiggle room” (slop).
pf2 (Bigram Boost):
Useful when the full phrase fails or user typed a longer phrase.
Captures strong two-word signals.
pf1 (Unigram Boost):
Useful for consistency in single-word queries.
Ensures short queries also benefit from phrase logic.
Fine-Tuning Slop (ps, ps2, ps1):
ps=2: Up to 2-term gaps in phrase.
ps2=1: Up to 1-term gap in bigrams.
ps1=0: No gap for single-term (exact).

5.4 Putting It All Together: Sample `solrconfig.xml`

<config>
  <requestHandler name="/select" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="defType">edismax</str>

      <!-- Core Query Fields -->
      <str name="qf">
        title^3.0
        summary^1.5
        text_all^0.5
      </str>

      <!-- Minimum “Should” Match -->
      <str name="mm">2&lt;75% 4&lt;90% 6&lt;100%</str>

      <!-- Phrase Boosts -->
      <str name="pf">title^5 summary^3</str>
      <str name="ps">2</str>
      <str name="pf2">title^4 summary^2</str>
      <str name="ps2">1</str>
      <str name="pf1">title^3 summary^1</str>
      <str name="ps1">0</str>

      <!-- Highlighting Defaults -->
      <str name="hl">true</str>
      <str name="hl.fl">title,summary,content</str>
    </lst>
  </requestHandler>
</config>

Explanation:

qf: Splits search across fields, boosting title most.
mm: Balances how many terms must match.
pf: Big phrase boost to reward exact (or near-exact) matches.
pf2: Two-word phrase boost, capturing key bigrams.
pf1: Single-term phrase boost, ensuring one-word queries still get a leg up.
ps, ps2, ps1: Slop controls—allowing some wiggle but not too much.

6. Advanced Examples & Scenarios

6.1 Long Queries with `pf2` and `pf`

Scenario:
A user queries: “quantum field theory experiments at low temperatures”.

GET /solr/classicbook/select?
  q=quantum field theory experiments at low temperatures
  &defType=edismax
  &qf=title^2 summary^1 text_all^0.3
  &mm=4&lt;75% 7&lt;90% 10&lt;100%
  &pf=title^5 summary^3
  &ps=2
  &pf2=title^4 summary^2
  &ps2=1

Term Count: 7 terms.
mm: of 7 terms, require 90% → 7 × 0.9 = 6.3 → 7 terms. (Strict)
Phrase Logic:
If “quantum field theory experiments” appears (with ≤ 2-term slop) in title, massive boost.
Even if that fails, any two-word pairs like “quantum field” or “field theory” get a smaller boost.

Vintage Wisdom: Back in the day, we hand-crafted this logic with multiple q clauses. Now, EDisMax elegantly bundles it.

6.2 Query-Time Overrides

Though defaults serve most cases, you can override parameters on the fly:

GET /solr/classicbook/select?
  q=renaissance art paintings
  &defType=edismax
  &qf=title^4 summary^2 text_all^0.4
  &pf=title^6 summary^3
  &ps=3
  &mm=2&lt;80% 5&lt;90% 8&lt;100%
  &rows=20
  &sort=score desc, publish_date desc

Query-Specific Boosts: Bump title further for “art”-centric queries.
Phrase Slop: Increased to 3 to allow permutations like “art of renaissance paintings”.
Sorting: Traditional “score then date” ordering.

7. Best Practices and Traditional Tips

Set Sensible Defaults:
Tune qf and mm in your solrconfig.xml defaults. Overrides are the exception, not the rule.
Phrase Boosting Wisdom:
Start with pf + ps=2. See if that yields decent results. Add pf2 if long queries are common.
Resist the urge to crank ps too high—phrase logic gets too loose.
Keep q Pure:
Always let q be user-provided text. Place boosting/logic in separate parameters like qf, bq, bf.
Iterate with Analytics:
Review Solr’s query logs. Are users often getting no results? Adjust mm.
Are phrase matches failing? Tweak ps or add pf2.
Legacy Reminder:
If you remember basic Lucene queries (+title:(ancient philosophy) +summary:(ancient philosophy)), EDisMax is your elegant shortcut. Honor the old ways by acknowledging them, but embrace convenience.

8. Summary

EDisMax streamlines complex boolean and phrase-scoring logic into a cohesive, user-friendly API. By mastering:

qf: Keep your query field mappings clear and boost weights sensible.
mm: Balance recall and precision with thoughtful minimum-match.
pf, pf1, pf2, ps, ps1, ps2: Elevate phrase relevance without over-engineering.

…you retain the “traditional Solr craftsmanship” while leveraging modern conveniences. May your search results be ever relevant, your phrase boosts well-balanced, and your schema forever flexible.

Written with respect for the traditions of Solr and a wink to the days of manual Lucene queries. Enjoy!

Read All...

Autocomplete in Solr and EdgeNGram Fields - the eCommerce helper

Solr `edgy_text` and `edgeNgram` Explanation

1. `edgy_text` Field Type:

The edgy_text field type typically uses Edge N-gram tokenization to create prefix-based tokens for indexing. This means the field is indexed in a way that allows matching partial terms or phrases, improving search results, especially for autocomplete or search-as-you-type functionality.

What is Edge N-gram Tokenization?

Edge N-grams are tokens that represent progressively longer prefixes of a word. For example, if you tokenize the word “example”, the Edge N-gram tokenizer could produce tokens like:
- “e”
- “ex”
- “exa”
- “exam”
- “examp”
- “exampl”
- “example”

This is useful for autocompletion, as it allows users to search by just typing the beginning of a word (or a term) and get matches based on the indexed prefixes.

How `edgy_text` works in Solr:

The edgy_text field type in Solr is usually implemented using the Edge N-gram filter. This filter generates prefixes of the input text and indexes those prefixes. It works well for fields where you want to allow prefix-based matching or autocomplete-like behavior.

2. `edgeNgram` Field Type:

The edgeNgram filter, when used with a Solr field type, can be applied to a field to tokenize the input data (like product names, categories, or SKUs) into progressively longer prefixes of words.

For Example:

If a field like product_name contains “laptop”, the edgeNgram filter might produce the following tokens: - “l” - “la” - “lap” - “lapt” - “lapto” - “exampl” - “laptop”

This allows a search query like “lap” to match “laptop”, making it perfect for search-as-you-type or autocomplete features.

Example of Edge Ngram in Solr:

Here’s how you can configure both edgy_text and edgeNgram in Solr:

1. `edgy_text` FieldType Definition:

This edgy_text field type uses the Edge N-gram filter to break words into progressively longer prefixes.

<fieldType name="edgy_text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <!-- Whitespace tokenizer to split the text into tokens based on spaces -->
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <!-- Edge N-gram filter to create prefixes from each word -->
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <!-- Similar tokenization for query analysis -->
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

minGramSize="1": Specifies that the smallest prefix (starting point) will be of length 1 (e.g., “l” from “laptop”).
maxGramSize="25": Limits the length of the token to a maximum of 25 characters. You can adjust this based on your needs.

2. Using `edgy_text` for Fields:

Once you’ve defined this field type, you can apply it to any field where you want to enable prefix matching. For instance, for a product name or SKU:

<field name="product_name" type="edgy_text" indexed="true" stored="true"/>
<field name="sku" type="edgy_text" indexed="true" stored="true"/>

These fields now support partial matching for searches, such as a query for “lap” returning “laptop”.

3. Benefits:

Prefix matching: Users can search by typing partial words, and Solr will return all results that match the prefix.
Autocomplete: This can be used to create search-as-you-type features in your application, where suggestions appear as users type.
Efficient search: Since Solr indexes only prefixes, it can efficiently match terms, especially with large datasets.

4. Use Case:

Consider an e-commerce site where users are typing product names in a search box. As the user types the prefix of the product name (e.g., “lap” for “laptop”), Solr can quickly return relevant results because the field is indexed with all possible prefixes of the product name. This results in faster search suggestions and matches.

Conclusion:

edgy_text with Edge N-gram: This combination helps create an efficient, powerful search feature for prefix-based matches, such as autocompletion.
Use Cases: Ideal for search-as-you-type functionality, improving user experience by providing real-time search suggestions or partial matches.

Read All...

Best Fulltext Solr Fields

How to Calibrate Solr Search for Optimal Results with `text_general` and `text_general_phonetic` Fields

Fine-tuning your Solr search queries is essential for getting the most relevant results. By using the text_general and text_general_phonetic field types, you can significantly enhance the quality of your search output.

Solr Field Types: `text_general` & `text_general_phonetic`

`text_general` Field

This field type is designed to handle general text content, such as titles, descriptions, and body text. It provides a powerful set of analyzers:

HTML Stripping: Strips HTML tags from the content to clean it up.
Unicode Tokenization: Breaks the text into tokens using an ICU tokenizer.
Word Delimiters: Handles word and number splitting to ensure proper tokenization.
Stopword Removal: Removes common stopwords from the text for more relevant results.
Stemming: Reduces words to their root form, making searches more accurate.

`text_general_phonetic` Field

The text_general_phonetic field type is perfect for phonetic searches, where sound-based similarity matters. It is similar to text_general, but with added functionality for phonetic matching:

Beider-Morse Phonetic Encoding: Encodes words based on their sound, allowing for sound-based matching.
Word Delimiters & Tokenization: Ensures that tokens are split appropriately for phonetic coding.
Synonym Support: Can expand synonyms during query time, improving search flexibility.
Accents Handling: Accents are preserved for better phonetic matching.

Optimizing Search Parameters for Best Results

Once you have your field types set up, the next step is to calibrate the search parameters for optimal results. Solr provides several parameters that allow you to adjust your search behavior, including mm, qf, and bf.

Understanding the Parameters

mm (Minimum Should Match): This parameter defines the minimum percentage of terms in the query that must match. It’s essential to adjust this based on the number of words in the query to balance precision and recall.
qf (Query Fields): Defines which fields to query and assigns boosting factors. The more relevant fields are boosted higher for better precision.
bf (Boost Functions): Allows you to boost the results based on specific functions, such as freshness or recency of documents.

Example: Query Parameter Calibration

Here’s an example of how to calibrate the parameters:

params["qf"] = "title^10 description^7 text^5 phonetic_title^0.3 phonetic_description^0.2 phonetic_text^0.1";
params["mm"] = "75%";
params["bf"] = "recip(rord(timestamp),1,1500,1500)^29";

In this example, the query boosts the title field heavily and adjusts the match percentage to 75%. Additionally, the bf function boosts more recent documents.

Field Definitions and Example Queries

Here are the actual field definitions for the text_general and text_general_phonetic field types:

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <!-- ================= INDEX‐TIME ANALYZER (English) ================= -->
      <analyzer type="index">
        <!-- 1. Strip HTML and fold accented characters (e.g. “résumé”→“resume”) -->
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>

        <!-- 2. Normalize comma/dot decimals: “5,8” → “5.8” -->
        <charFilter class="solr.PatternReplaceCharFilterFactory"
                    pattern="([0-9])[\\.,]([0-9])"
                    replacement="$1.$2"/>

        <!-- 3. Break text into Unicode words & numbers -->
        <tokenizer class="solr.ICUTokenizerFactory"/>

        <!-- 4. Split numbers/words but keep originals; protect tokens in protwords.txt -->
        <filter class="solr.WordDelimiterGraphFilterFactory"
                generateWordParts="1"
                generateNumberParts="1"
                catenateAll="0"
                catenateNumbers="1"
                catenateWords="0"
                splitOnCaseChange="1"
                preserveOriginal="1"
                protected="protwords.txt"/>

        <!-- 5. Discard tokens that are too short/long -->
        <filter class="solr.LengthFilterFactory" min="1" max="50" />

        <!-- 6. Fold any remaining accents (keep original) -->
        <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>

        <!-- 7. Lowercase everything -->
        <filter class="solr.LowerCaseFilterFactory"/>

        <!-- 8. Remove English stopwords (stopwords.txt should now contain English list) -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

        <!-- 9. Remove English possessive ’s  (“John’s”→“John”) -->
        <filter class="solr.EnglishPossessiveFilterFactory"/>

        <!-- 10. Apply English SnowballPorter stemming, protecting protwords.txt -->
        <filter class="solr.SnowballPorterFilterFactory"
                language="English"
                protected="protwords.txt"/>

        <!-- 11. Remove any duplicate tokens -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>

      <!-- ================= QUERY‐TIME ANALYZER (English) ================= -->
      <analyzer type="query">
        <!-- 1. Strip HTML and fold accented characters -->
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>

        <!-- 2. Normalize comma/dot decimals at query time -->
        <charFilter class="solr.PatternReplaceCharFilterFactory"
                    pattern="([0-9])[\\.,]([0-9])"
                    replacement="$1.$2"/>

        <!-- 3. ICU tokenizer for Unicode words & numbers -->
        <tokenizer class="solr.ICUTokenizerFactory"/>

        <!-- 4. Split numbers/words but keep originals; protect protwords.txt -->
        <filter class="solr.WordDelimiterGraphFilterFactory"
                generateWordParts="1"
                generateNumberParts="1"
                catenateAll="0"
                catenateNumbers="1"
                catenateWords="0"
                splitOnCaseChange="1"
                preserveOriginal="1"
                protected="protwords.txt"/>

        <!-- 5. Discard tokens that are too short/long -->
        <filter class="solr.LengthFilterFactory" min="1" max="50"/>

        <!-- 6. Fold any remaining accents (keep original) -->
        <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>

        <!-- 7. Lowercase everything -->
        <filter class="solr.LowerCaseFilterFactory"/>

        <!-- 8. Expand synonyms before removing stopwords -->
        <filter class="solr.SynonymGraphFilterFactory"
                expand="true"
                ignoreCase="true"
                synonyms="synonyms.txt"/>

        <!-- 9. Remove English stopwords (stopwords.txt) -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />

        <!-- 10. Remove English possessive ’s -->
        <filter class="solr.EnglishPossessiveFilterFactory"/>

        <!-- 11. Apply English SnowballPorter stemming, protecting protwords.txt -->
        <filter class="solr.SnowballPorterFilterFactory"
                language="English"
                protected="protwords.txt"/>

        <!-- 12. Remove any duplicate tokens -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
      </analyzer>
    </fieldType>


    <!--Phonetic Text Field-->
    <fieldType name="text_general_phonetic" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <!-- 1) Strip HTML -->
        <charFilter class="solr.HTMLStripCharFilterFactory"/>

        <!-- 2) Tokenize on Unicode word boundaries rather than bare whitespace -->
        <!--    WhitespaceTokenizer will treat “Co‐op” as one token, but you probably want “Co” + “op”. -->
        <tokenizer class="solr.ICUTokenizerFactory"/>

        <!-- 3) Remove stopwords early on -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

        <!-- 4) Break apart numbers/words but keep the original spelling for phonetic coding -->
        <filter class="solr.WordDelimiterGraphFilterFactory"
                generateWordParts="1" splitOnNumerics="1" splitOnCaseChange="1"
                generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"
                preserveOriginal="1"
                protected="protwords.txt" />

        <!-- 5) Lowercase now so phonetic sees normalized input -->
        <filter class="solr.LowerCaseFilterFactory"/>



        <filter class="solr.PorterStemFilterFactory"/>


        <!-- 7) Fold accents (but keep originals so BeiderMorse sees both accented & un-accented) -->
        <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>

        <!-- 8) Synonyms (optional—but note: synonyms + phonetics = explosion of tokens) -->
        <filter class="solr.SynonymFilterFactory"
                synonyms="synonyms.txt" ignoreCase="true" expand="false"/>

        <!-- 9) Phonetic coding: only keep **one** code per token if possible -->
        <!--    nameType="GENERIC" ruleType="APPROX" is fine, but “concat=true” will glue codes together. -->
        <!--    For better control, set concat="false" so each code is its own token. -->
        <filter class="solr.BeiderMorseFilterFactory"
                nameType="GENERIC"
                ruleType="APPROX"
                concat="false"
                languageSet="auto"/>
      </analyzer>

      <analyzer type="query">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.ICUTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterGraphFilterFactory"
                generateWordParts="1" splitOnNumerics="1" splitOnCaseChange="1"
                generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"
                preserveOriginal="1"
                protected="protwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
        <filter class="solr.SynonymFilterFactory"
                synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
        <filter class="solr.BeiderMorseFilterFactory"
                nameType="GENERIC"
                ruleType="APPROX"
                concat="false"
                languageSet="auto"/>
      </analyzer>
    </fieldType>

Conclusion Calibrating Solr’s search parameters for specific field types like text_general and text_general_phonetic ensures that you’re getting the most relevant results from your searches. By adjusting key parameters like mm, qf, and bf, you can refine your search queries and achieve optimal performance tailored to your needs.

Read All...

Documentation

The Solr edismax query Parser. How it works and how it helps.

Solr EDisMax Query Parser Guide

Introduction

1. EDisMax Parameter Overviews

1.1 qf (Query Fields)

1.2 mm (Minimum “Should” Match)

1.3 pf, pf1, pf2 (Phrase Fields)

1.4 ps, ps1, ps2 (Phrase Slops)

2. Practical schema.xml Definitions

3. Keeping q Clean with qf

4. EDisMax in Action: Query Examples

4.1 Basic Keyword Search with qf and mm

4.2 Phrase Boosting with pf and ps

4.3 Two-Term Phrase with pf2 and ps2

4.4 One-Term Phrase with pf1 and ps1

5. Deep Dive into Parameters

5.1 qf: The Heart of Clean Queries

5.2 mm: Balancing Recall vs. Precision

5.3 pf, pf1, pf2: Phrase Boosting Strategies

5.4 Putting It All Together: Sample solrconfig.xml

6. Advanced Examples & Scenarios

6.1 Long Queries with pf2 and pf

6.2 Query-Time Overrides

7. Best Practices and Traditional Tips

8. Summary

Autocomplete in Solr and EdgeNGram Fields - the eCommerce helper

Solr edgy_text and edgeNgram Explanation

1. edgy_text Field Type:

What is Edge N-gram Tokenization?

How edgy_text works in Solr:

2. edgeNgram Field Type:

For Example:

Example of Edge Ngram in Solr:

1. edgy_text FieldType Definition:

2. Using edgy_text for Fields:

3. Benefits:

4. Use Case:

Conclusion:

Best Fulltext Solr Fields

How to Calibrate Solr Search for Optimal Results with text_general and text_general_phonetic Fields

Solr Field Types: text_general & text_general_phonetic

text_general Field

text_general_phonetic Field

Optimizing Search Parameters for Best Results

Understanding the Parameters

Example: Query Parameter Calibration

Field Definitions and Example Queries

1.1 `qf` (Query Fields)

1.2 `mm` (Minimum “Should” Match)

1.3 `pf`, `pf1`, `pf2` (Phrase Fields)

1.4 `ps`, `ps1`, `ps2` (Phrase Slops)

2. Practical `schema.xml` Definitions

3. Keeping `q` Clean with `qf`

4.1 Basic Keyword Search with `qf` and `mm`

4.2 Phrase Boosting with `pf` and `ps`

4.3 Two-Term Phrase with `pf2` and `ps2`

4.4 One-Term Phrase with `pf1` and `ps1`

5.1 `qf`: The Heart of Clean Queries

5.2 `mm`: Balancing Recall vs. Precision

5.3 `pf`, `pf1`, `pf2`: Phrase Boosting Strategies

5.4 Putting It All Together: Sample `solrconfig.xml`

6.1 Long Queries with `pf2` and `pf`

Solr `edgy_text` and `edgeNgram` Explanation

1. `edgy_text` Field Type:

How `edgy_text` works in Solr:

2. `edgeNgram` Field Type:

1. `edgy_text` FieldType Definition:

2. Using `edgy_text` for Fields:

How to Calibrate Solr Search for Optimal Results with `text_general` and `text_general_phonetic` Fields

Solr Field Types: `text_general` & `text_general_phonetic`

`text_general` Field

`text_general_phonetic` Field