Documentation > Opensolr Configuration Files > Autocomplete in Solr and EdgeNGram Fields - the eCommerce helper

Solr edgy_text and edgeNgram Explanation

1. edgy_text Field Type:

The edgy_text field type typically uses Edge N-gram tokenization to create prefix-based tokens for indexing. This means the field is indexed in a way that allows matching partial terms or phrases, improving search results, especially for autocomplete or search-as-you-type functionality.

What is Edge N-gram Tokenization?

  • Edge N-grams are tokens that represent progressively longer prefixes of a word. For example, if you tokenize the word “example”, the Edge N-gram tokenizer could produce tokens like:
    • “e”
    • “ex”
    • “exa”
    • “exam”
    • “examp”
    • “exampl”
    • “example”

This is useful for autocompletion, as it allows users to search by just typing the beginning of a word (or a term) and get matches based on the indexed prefixes.

How edgy_text works in Solr:

The edgy_text field type in Solr is usually implemented using the Edge N-gram filter. This filter generates prefixes of the input text and indexes those prefixes. It works well for fields where you want to allow prefix-based matching or autocomplete-like behavior.

2. edgeNgram Field Type:

The edgeNgram filter, when used with a Solr field type, can be applied to a field to tokenize the input data (like product names, categories, or SKUs) into progressively longer prefixes of words.

For Example:

If a field like product_name contains “laptop”, the edgeNgram filter might produce the following tokens: - “l” - “la” - “lap” - “lapt” - “lapto” - “exampl” - “laptop”

This allows a search query like “lap” to match “laptop”, making it perfect for search-as-you-type or autocomplete features.

Example of Edge Ngram in Solr:

Here’s how you can configure both edgy_text and edgeNgram in Solr:

1. edgy_text FieldType Definition:

This edgy_text field type uses the Edge N-gram filter to break words into progressively longer prefixes.

<fieldType name="edgy_text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <!-- Whitespace tokenizer to split the text into tokens based on spaces -->
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <!-- Edge N-gram filter to create prefixes from each word -->
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <!-- Similar tokenization for query analysis -->
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>
  • minGramSize="1": Specifies that the smallest prefix (starting point) will be of length 1 (e.g., “l” from “laptop”).
  • maxGramSize="25": Limits the length of the token to a maximum of 25 characters. You can adjust this based on your needs.

2. Using edgy_text for Fields:

Once you’ve defined this field type, you can apply it to any field where you want to enable prefix matching. For instance, for a product name or SKU:

<field name="product_name" type="edgy_text" indexed="true" stored="true"/>
<field name="sku" type="edgy_text" indexed="true" stored="true"/>

These fields now support partial matching for searches, such as a query for “lap” returning “laptop”.

3. Benefits:

  • Prefix matching: Users can search by typing partial words, and Solr will return all results that match the prefix.
  • Autocomplete: This can be used to create search-as-you-type features in your application, where suggestions appear as users type.
  • Efficient search: Since Solr indexes only prefixes, it can efficiently match terms, especially with large datasets.

4. Use Case:

Consider an e-commerce site where users are typing product names in a search box. As the user types the prefix of the product name (e.g., “lap” for “laptop”), Solr can quickly return relevant results because the field is indexed with all possible prefixes of the product name. This results in faster search suggestions and matches.

Conclusion:

  • edgy_text with Edge N-gram: This combination helps create an efficient, powerful search feature for prefix-based matches, such as autocompletion.
  • Use Cases: Ideal for search-as-you-type functionality, improving user experience by providing real-time search suggestions or partial matches.





Review us on Google Business
ISO-9001 CERTIFIED ISO-27001 CERTIFIED