Documentation > Opensolr Configuration Files > Best Fulltext Solr Fields

How to Calibrate Solr Search for Optimal Results with text_general and text_general_phonetic Fields

Fine-tuning your Solr search queries is essential for getting the most relevant results. By using the text_general and text_general_phonetic field types, you can significantly enhance the quality of your search output.

Solr Field Types: text_general & text_general_phonetic

text_general Field

This field type is designed to handle general text content, such as titles, descriptions, and body text. It provides a powerful set of analyzers:

  • HTML Stripping: Strips HTML tags from the content to clean it up.
  • Unicode Tokenization: Breaks the text into tokens using an ICU tokenizer.
  • Word Delimiters: Handles word and number splitting to ensure proper tokenization.
  • Stopword Removal: Removes common stopwords from the text for more relevant results.
  • Stemming: Reduces words to their root form, making searches more accurate.

text_general_phonetic Field

The text_general_phonetic field type is perfect for phonetic searches, where sound-based similarity matters. It is similar to text_general, but with added functionality for phonetic matching:

  • Beider-Morse Phonetic Encoding: Encodes words based on their sound, allowing for sound-based matching.
  • Word Delimiters & Tokenization: Ensures that tokens are split appropriately for phonetic coding.
  • Synonym Support: Can expand synonyms during query time, improving search flexibility.
  • Accents Handling: Accents are preserved for better phonetic matching.

Optimizing Search Parameters for Best Results

Once you have your field types set up, the next step is to calibrate the search parameters for optimal results. Solr provides several parameters that allow you to adjust your search behavior, including mm, qf, and bf.

Understanding the Parameters

  • mm (Minimum Should Match): This parameter defines the minimum percentage of terms in the query that must match. It’s essential to adjust this based on the number of words in the query to balance precision and recall.
  • qf (Query Fields): Defines which fields to query and assigns boosting factors. The more relevant fields are boosted higher for better precision.
  • bf (Boost Functions): Allows you to boost the results based on specific functions, such as freshness or recency of documents.

Example: Query Parameter Calibration

Here’s an example of how to calibrate the parameters:

params["qf"] = "title^10 description^7 text^5 phonetic_title^0.3 phonetic_description^0.2 phonetic_text^0.1";
params["mm"] = "75%";
params["bf"] = "recip(rord(timestamp),1,1500,1500)^29";

In this example, the query boosts the title field heavily and adjusts the match percentage to 75%. Additionally, the bf function boosts more recent documents.

Field Definitions and Example Queries

Here are the actual field definitions for the text_general and text_general_phonetic field types:

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <!-- ================= INDEX‐TIME ANALYZER (English) ================= -->
      <analyzer type="index">
        <!-- 1. Strip HTML and fold accented characters (e.g. “résumé”→“resume”) -->
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>

        <!-- 2. Normalize comma/dot decimals: “5,8” → “5.8” -->
        <charFilter class="solr.PatternReplaceCharFilterFactory"
                    pattern="([0-9])[\\.,]([0-9])"
                    replacement="$1.$2"/>

        <!-- 3. Break text into Unicode words & numbers -->
        <tokenizer class="solr.ICUTokenizerFactory"/>

        <!-- 4. Split numbers/words but keep originals; protect tokens in protwords.txt -->
        <filter class="solr.WordDelimiterGraphFilterFactory"
                generateWordParts="1"
                generateNumberParts="1"
                catenateAll="0"
                catenateNumbers="1"
                catenateWords="0"
                splitOnCaseChange="1"
                preserveOriginal="1"
                protected="protwords.txt"/>

        <!-- 5. Discard tokens that are too short/long -->
        <filter class="solr.LengthFilterFactory" min="1" max="50" />

        <!-- 6. Fold any remaining accents (keep original) -->
        <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>

        <!-- 7. Lowercase everything -->
        <filter class="solr.LowerCaseFilterFactory"/>

        <!-- 8. Remove English stopwords (stopwords.txt should now contain English list) -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

        <!-- 9. Remove English possessive ’s  (“John’s”→“John”) -->
        <filter class="solr.EnglishPossessiveFilterFactory"/>

        <!-- 10. Apply English SnowballPorter stemming, protecting protwords.txt -->
        <filter class="solr.SnowballPorterFilterFactory"
                language="English"
                protected="protwords.txt"/>

        <!-- 11. Remove any duplicate tokens -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>

      <!-- ================= QUERY‐TIME ANALYZER (English) ================= -->
      <analyzer type="query">
        <!-- 1. Strip HTML and fold accented characters -->
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>

        <!-- 2. Normalize comma/dot decimals at query time -->
        <charFilter class="solr.PatternReplaceCharFilterFactory"
                    pattern="([0-9])[\\.,]([0-9])"
                    replacement="$1.$2"/>

        <!-- 3. ICU tokenizer for Unicode words & numbers -->
        <tokenizer class="solr.ICUTokenizerFactory"/>

        <!-- 4. Split numbers/words but keep originals; protect protwords.txt -->
        <filter class="solr.WordDelimiterGraphFilterFactory"
                generateWordParts="1"
                generateNumberParts="1"
                catenateAll="0"
                catenateNumbers="1"
                catenateWords="0"
                splitOnCaseChange="1"
                preserveOriginal="1"
                protected="protwords.txt"/>

        <!-- 5. Discard tokens that are too short/long -->
        <filter class="solr.LengthFilterFactory" min="1" max="50"/>

        <!-- 6. Fold any remaining accents (keep original) -->
        <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>

        <!-- 7. Lowercase everything -->
        <filter class="solr.LowerCaseFilterFactory"/>

        <!-- 8. Expand synonyms before removing stopwords -->
        <filter class="solr.SynonymGraphFilterFactory"
                expand="true"
                ignoreCase="true"
                synonyms="synonyms.txt"/>

        <!-- 9. Remove English stopwords (stopwords.txt) -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />

        <!-- 10. Remove English possessive ’s -->
        <filter class="solr.EnglishPossessiveFilterFactory"/>

        <!-- 11. Apply English SnowballPorter stemming, protecting protwords.txt -->
        <filter class="solr.SnowballPorterFilterFactory"
                language="English"
                protected="protwords.txt"/>

        <!-- 12. Remove any duplicate tokens -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
      </analyzer>
    </fieldType>


    <!--Phonetic Text Field-->
    <fieldType name="text_general_phonetic" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <!-- 1) Strip HTML -->
        <charFilter class="solr.HTMLStripCharFilterFactory"/>

        <!-- 2) Tokenize on Unicode word boundaries rather than bare whitespace -->
        <!--    WhitespaceTokenizer will treat “Co‐op” as one token, but you probably want “Co” + “op”. -->
        <tokenizer class="solr.ICUTokenizerFactory"/>

        <!-- 3) Remove stopwords early on -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

        <!-- 4) Break apart numbers/words but keep the original spelling for phonetic coding -->
        <filter class="solr.WordDelimiterGraphFilterFactory"
                generateWordParts="1" splitOnNumerics="1" splitOnCaseChange="1"
                generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"
                preserveOriginal="1"
                protected="protwords.txt" />

        <!-- 5) Lowercase now so phonetic sees normalized input -->
        <filter class="solr.LowerCaseFilterFactory"/>



        <filter class="solr.PorterStemFilterFactory"/>


        <!-- 7) Fold accents (but keep originals so BeiderMorse sees both accented & un-accented) -->
        <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>

        <!-- 8) Synonyms (optional—but note: synonyms + phonetics = explosion of tokens) -->
        <filter class="solr.SynonymFilterFactory"
                synonyms="synonyms.txt" ignoreCase="true" expand="false"/>

        <!-- 9) Phonetic coding: only keep **one** code per token if possible -->
        <!--    nameType="GENERIC" ruleType="APPROX" is fine, but “concat=true” will glue codes together. -->
        <!--    For better control, set concat="false" so each code is its own token. -->
        <filter class="solr.BeiderMorseFilterFactory"
                nameType="GENERIC"
                ruleType="APPROX"
                concat="false"
                languageSet="auto"/>
      </analyzer>

      <analyzer type="query">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.ICUTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterGraphFilterFactory"
                generateWordParts="1" splitOnNumerics="1" splitOnCaseChange="1"
                generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"
                preserveOriginal="1"
                protected="protwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
        <filter class="solr.SynonymFilterFactory"
                synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
        <filter class="solr.BeiderMorseFilterFactory"
                nameType="GENERIC"
                ruleType="APPROX"
                concat="false"
                languageSet="auto"/>
      </analyzer>
    </fieldType>

Conclusion Calibrating Solr’s search parameters for specific field types like text_general and text_general_phonetic ensures that you’re getting the most relevant results from your searches. By adjusting key parameters like mm, qf, and bf, you can refine your search queries and achieve optimal performance tailored to your needs.






Review us on Google Business
ISO-9001 CERTIFIED ISO-27001 CERTIFIED